Last week, SitePoint launched its second Twitaway, giving away a free PDF of The Principles of Successful Freelancing to anyone kind enough to follow us on Twitter. Predictably, the site we built for readers to claim the PDF was swamped with traffic in the first few hours of the giveaway, and the MySQL server behind the scenes quickly ground to a halt.
First, a little background: at the time of our first Twitaway in 2008, Twitter HQ had been kind enough to relax the limit it places on the maximum number of Twitter direct messages (DMs) that it would allow us to send in a single day. To distribute the free PDFs, then, we simply wrote a PHP script that would look through our Twitter followers every few minutes, identify the new ones and send them a DM with the download link.
This time around, when we sat down to plan Twitaway 2009, it was clear our job would be more difficult. Much less cooperative than they had been previously, Twitter HQ stonewalled our requests to relax the daily DM limit—they wouldn’t even tell us what our daily limit was. For awhile there, we weren’t sure how we would be able to deliver a download link privately to our Twitter followers.
The solution finally came from Mal, a new developer on the team here at SitePoint HQ. We would use Twitter’s new OAuth support to request temporary access to your Twitter account, use it to verify that you were indeed following us, and then simply display the PDF download link in your browser. No DM required!
Of course, writing a PHP application that supported this OAuth-based workflow was a lot more complicated than writing a static web page with a “follow us on Twitter” link. Not only would it be harder to write, but it would be harder to handle large amounts of traffic hitting it, too.
So there we were: just past midnight on launch day of the Twitaway, and the application had stopped responding. Our monitoring systems made it clear that, while our PHP servers were weathering the storm nicely, the MySQL database server had spiked to 100% CPU usage, stuck its fingers in its ears and begun rocking back and forth, chanting “na-na-na-na-na-I’m-not-listening!”
As it turned out, this was an important lesson for our development team about the importance of choosing appropriate MySQL table types. Read on for how we solved the problem, and why we should have predicted it.
Choosing The Right Table Type
When we logged into the swamped MySQL server, we could see (using the SHOW PROCESSLIST command) that the overwhelming majority of the queries that it was receiving were for the database table we had set up to contain PHP session data.
By default, PHP stores your users’ session variables in files on the web server’s hard drive. In our case, however, we had multiple web servers that all needed to share the same session data, so we had configured PHP to store its sessions on the MySQL database server instead.
Unfortunately, we had failed to specify a table type when creating the session table in the database:
CREATE TABLE sessions ( sesskey VARCHAR(32) NOT NULL DEFAULT '' PRIMARY KEY, expiry INT(11), expireref VARCHAR(64), sessdata LONGTEXT );
MySQL supports a number of different storage engines. When you create a table, by default MySQL will use the MyISAM storage engine to create a MyISAM table. MyISAM is a relatively simple table type, which forgoes several advanced features in order to offer improved performance. In the early days of MySQL, the MyISAM table type was instrumental in making it one of the best performing databases around.
These characteristics would seem to make MyISAM a natural choice for something like a session data table. Unfortunately, MyISAM has an achilles heel: table-level locking. In order to write a change to a MyISAM table, MySQL must lock the whole table, preventing any other query from reading the table until the change is complete.
As thousands upon thousands of SitePoint readers flocked to our Twitaway site to claim their book, multiple PHP processes competed to create, update, and read session data in this single table. And because of MyISAM’s table-level locking, they had to do so one at a time, waiting their turn on an increasingly backed-up database server.
On the surface, the InnoDB storage engine would seem to be a poor choice for improving database performance. It supports a bunch of features that MyISAM doesn’t, including transactions and enforced referential integrity, and all the literature will tell you that writing data to an InnoDB table is slower than with a MyISAM table.
But where MyISAM has an achilles heel, InnoDB has a secret weapon: row-level locking. When writing data to an InnoDB table, only the affected rows are locked, not the whole table. Queries accessing other parts of the same table can happen simultaneously. In a situation like session data, where each browser request being processed is likely to relate to a different session, row-level locking is just what the doctor ordered!
With this in mind, you can probably guess how we cured our swamped database server. We simply converted the session table to an InnoDB table:
ALTER TABLE sessions ENGINE = InnoDB;
Instantly, the CPU load on the database server dropped to more reasonable levels, the Twitaway site came back online, and there was much rejoicing. We could have saved ourselves a lot of grief simply by specifying the table type when we first created the table:
CREATE TABLE sessions ( sesskey VARCHAR(32) NOT NULL DEFAULT '' PRIMARY KEY, expiry INT(11), expireref VARCHAR(64), sessdata LONGTEXT ) ENGINE = InnoDB;
Switching to InnoDB got us through that first night, but by the next morning we were wondering if there might be an even better table type for us to use. After a little research, the Memory storage engine looked like a likely candidate.
If MyISAM sacrifices features for performance, the Memory table type takes this to extremes. Memory tables are never written to disk, so if you restart your database server, the data stored in any Memory tables is lost. The fact that Memory tables require no disk access, however, makes them blazing fast! Since session data is intended to be volatile anyway, we didn’t really mind if this particular table wouldn’t survive a server restart.
According to the documentation, Memory tables have the same table-level locking weakness as MyISAM. Nevertheless, when we switched our session table to the Memory storage engine, our database server’s CPU usage dropped to virtually zero!
Memory tables have some harsh restrictions that mean they aren’t a cure-all for performance issues. Disappearing data aside, Memory tables cannot contain large column types like TEXT or BLOB. Indeed, the largest piece of data you can store in a Memory table row is a 255 character CHAR or VARCHAR. In our case, we actually had to shorten a couple of our session variable names to make sure they would fit!
At this point, our session table is gliding along happily under moderate load. As the traffic from this article and some other promotion we have planned hits the server, we will be watching closely to decide if the Memory storage engine is up to the task, or if we should switch back to InnoDB to take advantage of row-level locking. In either case, we learned an important lesson: when performance counts, take the time to think about you choice of MySQL storage engine. In our case, it took us from a swamped database server to one showing zero CPU usage.