As folks noticed on Thursday morning when we opened hotels, the website displayed errors for 5-10 minutes starting at 9 AM EST due to the rush of traffic that came in. This was due to a few factors, which I'd like to go into in technical detail below.
First, some raw stats:
Peak bandwidth: approx 3.6 Megabits/sec
Peak connections: 1,060 concurrent connections
Number of users logged into the site: 100+
Here are some graphs that show just how big those numbers are, compared to normal traffic levels:
For those who saw last year's post about the load on the webserver when we opened up hotels, this year's traffic was about twice what last year's traffic was. I didn't see that coming.
As is plainly visible in the traffic graph, the machine that this website runs on is capabale of much higher bandwidth throughput. So, what happened?
In a word: caching. Or rather, the lack thereof in certain cases.
Looking back after this morning's stampede, I thought I'd share with folks how the webserver held up, since I know I am not the only geek out there. And, truth be told, I was a bit nervous myself, since I wasn't quite sure just how much traffic we would get and if the webserver would survive, or turn into a smoking crater.
Well, here's what we got:
The first hump is a manual backup I did last night. The second is the automatic backup that runs every morning, where the database and files are rsynced to a machine at another data center. The third hump at 9 AM was when we opened hotel reservations. 1.4 Megabits/sec doesn't look too bad, until you look at:
The 336 simultaneous connections a second was far more interesting. That's about 16 times the normal number of connections to the webserver.
So, what were the effects? Let's look at MySQL first: