Scaling Anthrocon's Website to Handle 1,400 Simultaneous Connections

FUR_0306

The Challenge

When hotel reservations open, that is the single busiest time of the year for Anthrocon's webserver. In fact, it even caused us performance problems last year. That was not so good.

So this year, I decided to try something different. Instead of leaving the regular website up and running, which involves using Drupal, I instead decided to replace the entire page with a relatively static "countdown" page, which displayed a countdown timer and automatically started displaying the hotel link at 11 AM on the opening day.

First, some stats for the Anthrocon website:

  • Peak bandwidth: 1.6 Megabits/sec
  • Peak connections: 1,400 concurrent connections

And now some status for Passkey, who handled most of the traffic:

  • Peak bandwidth: 190 Megabits/sec
  • Peak connections: 4,000 concurrent connections

Lightening the Load on the Webserver

With the kind of traffic mentioned above, I figured that the only way to keep the website from falling over again was to get Drupal out of the picture, at least for a few hours. Drupal is a fine Content Management System (CMS), but it's pretty heavy on CPU and database usage. That means it won't scale unless we throw tons of additional hardware at it. And it's not cost efficient to provision additional hardware that is used only once a year.

So the first thing I did was to write some PHP scripts that would serve up a series of static pages on the morning of February 1st. If the time was before 11 AM, a Javascript-based countdown timer would be displayed. If the time was greater than 11 AM on February 1st, a page with the link to hotel reservations would be displayed instead. Here are the screenshots of the pages in question:

But there was more to it than that. I needed to get as much traffic as possible away from our website. The next thing I did was to relocate our assets (image files, CSS files, and Javascript) onto Amazon's S3 service. At $0.12/Gigabyte for data transfer, it was fairly cheap. And Amazon had the infrastructure to handle lots of requests. Google was also a help here, as they host copies of jQuery that are available for public use.

Furthermore, I decided that to keep from hammering the CPU too hard, I would keep from accessing MySQL at all. That was easy enough to do, as I built the PHP scripts for serving up the read-only version of the website from the ground up.

Open Source

As a side note, I decided that I would contribute my code to the community after we were done with it, so I open-sourced my code and place it on GitHub: https://github.com/dmuth/anthrocon-hotel-countdown

Yes, we've open-sourced other projects over the years. The full list can be found here.

Social Media

We decided to use social media much more extensively this year. While last year it was an afterthought, this year it was a key part of our strategy to announce the hotel reservation link. We wrote up Tweets, Facebook posts, and Google Plus posts ahead of time. Then we announced those channels well in advance, as well as on our "countdown page", so that even if the website went down, we could get the word out, and people would know where to look.

When it came to be 11 AM on Friday morning, it was a simply copy and paste job to make the Tweets and posts with the hotel information. That helped divert even more traffic from our webserver.

Implementation

At 10 AM on Friday morning, I put the website into read-only mode. Since all pages in Drupal are served up through a single PHP script, switching over the website was a simple matter of running the UNIX command:

rm index.php && ln -s hotel-countdown/index.php .

The Results

The results were better than expected! Once the website was put into read-only mode, CPU and bandwidth usage dropped to near zero. In fact, even as the number of users visiting the site climbed both of those levels remained astonishingly low, up through and including 11 AM, the "zero hour".

At 11:15 AM traffic levels were on their way down, at which point I restored Drupal with the following UNIX command:

rm index.php && ln -s main.php index.php

Total time spent in read-only mode was 75 minutes.

Here are some graphs that better show the bandwidth, CPU usage, and number of concurrent connections:

Opening Anthrocon 2013 Hotel Reservations Opening Anthrocon 2013 Hotel Reservations Opening Anthrocon 2013 Hotel Reservations

Conclusions

Throughout the entire day, the website remained operational, with effectively 100% uptime. We received numerous compliments from our members on how smooth everything went, on both our end and Passkey's end. Here are some of the things that our members had to say about Passkey in our survey:

  • "Smooth, quick and surprisingly pain-free"
  • "Everything was so simple and streamlined"
  • "Smooth, intuitive, and zero lag"
  • "Smoothest experience I have ever had reserving a hotel room for Anthrocon"
  • "It was super quick and dead easy"

I think it is safe to say that using the Passkey system was an Epic Win, and that we can continue with this approach in future years.

3
Average: 3 (11 votes)
Your rating: None