Free Performance with MySQL Table Types

Tweet

Last week, SitePoint launched its second Twitaway, giving away a free PDF of The Principles of Successful Freelancing to anyone kind enough to follow us on Twitter. Predictably, the site we built for readers to claim the PDF was swamped with traffic in the first few hours of the giveaway, and the MySQL server behind the scenes quickly ground to a halt.

First, a little background: at the time of our first Twitaway in 2008, Twitter HQ had been kind enough to relax the limit it places on the maximum number of Twitter direct messages (DMs) that it would allow us to send in a single day. To distribute the free PDFs, then, we simply wrote a PHP script that would look through our Twitter followers every few minutes, identify the new ones and send them a DM with the download link.

This time around, when we sat down to plan Twitaway 2009, it was clear our job would be more difficult. Much less cooperative than they had been previously, Twitter HQ stonewalled our requests to relax the daily DM limit—they wouldn’t even tell us what our daily limit was. For awhile there, we weren’t sure how we would be able to deliver a download link privately to our Twitter followers.

The solution finally came from Mal, a new developer on the team here at SitePoint HQ. We would use Twitter’s new OAuth support to request temporary access to your Twitter account, use it to verify that you were indeed following us, and then simply display the PDF download link in your browser. No DM required!

Of course, writing a PHP application that supported this OAuth-based workflow was a lot more complicated than writing a static web page with a “follow us on Twitter” link. Not only would it be harder to write, but it would be harder to handle large amounts of traffic hitting it, too.

So there we were: just past midnight on launch day of the Twitaway, and the application had stopped responding. Our monitoring systems made it clear that, while our PHP servers were weathering the storm nicely, the MySQL database server had spiked to 100% CPU usage, stuck its fingers in its ears and begun rocking back and forth, chanting “na-na-na-na-na-I’m-not-listening!”

As it turned out, this was an important lesson for our development team about the importance of choosing appropriate MySQL table types. Read on for how we solved the problem, and why we should have predicted it.

Choosing The Right Table Type

When we logged into the swamped MySQL server, we could see (using the SHOW PROCESSLIST command) that the overwhelming majority of the queries that it was receiving were for the database table we had set up to contain PHP session data.

By default, PHP stores your users’ session variables in files on the web server’s hard drive. In our case, however, we had multiple web servers that all needed to share the same session data, so we had configured PHP to store its sessions on the MySQL database server instead.

Unfortunately, we had failed to specify a table type when creating the session table in the database:

CREATE TABLE sessions (
  sesskey VARCHAR(32) NOT NULL DEFAULT '' PRIMARY KEY,
  expiry INT(11),
  expireref VARCHAR(64),
  sessdata LONGTEXT
);

MySQL supports a number of different storage engines. When you create a table, by default MySQL will use the MyISAM storage engine to create a MyISAM table. MyISAM is a relatively simple table type, which forgoes several advanced features in order to offer improved performance. In the early days of MySQL, the MyISAM table type was instrumental in making it one of the best performing databases around.

These characteristics would seem to make MyISAM a natural choice for something like a session data table. Unfortunately, MyISAM has an achilles heel: table-level locking. In order to write a change to a MyISAM table, MySQL must lock the whole table, preventing any other query from reading the table until the change is complete.

As thousands upon thousands of SitePoint readers flocked to our Twitaway site to claim their book, multiple PHP processes competed to create, update, and read session data in this single table. And because of MyISAM’s table-level locking, they had to do so one at a time, waiting their turn on an increasingly backed-up database server.

On the surface, the InnoDB storage engine would seem to be a poor choice for improving database performance. It supports a bunch of features that MyISAM doesn’t, including transactions and enforced referential integrity, and all the literature will tell you that writing data to an InnoDB table is slower than with a MyISAM table.

But where MyISAM has an achilles heel, InnoDB has a secret weapon: row-level locking. When writing data to an InnoDB table, only the affected rows are locked, not the whole table. Queries accessing other parts of the same table can happen simultaneously. In a situation like session data, where each browser request being processed is likely to relate to a different session, row-level locking is just what the doctor ordered!

With this in mind, you can probably guess how we cured our swamped database server. We simply converted the session table to an InnoDB table:

ALTER TABLE sessions ENGINE = InnoDB;

Instantly, the CPU load on the database server dropped to more reasonable levels, the Twitaway site came back online, and there was much rejoicing. We could have saved ourselves a lot of grief simply by specifying the table type when we first created the table:

CREATE TABLE sessions (
  sesskey VARCHAR(32) NOT NULL DEFAULT '' PRIMARY KEY,
  expiry INT(11),
  expireref VARCHAR(64),
  sessdata LONGTEXT
) ENGINE = InnoDB;

Switching to InnoDB got us through that first night, but by the next morning we were wondering if there might be an even better table type for us to use. After a little research, the Memory storage engine looked like a likely candidate.

If MyISAM sacrifices features for performance, the Memory table type takes this to extremes. Memory tables are never written to disk, so if you restart your database server, the data stored in any Memory tables is lost. The fact that Memory tables require no disk access, however, makes them blazing fast! Since session data is intended to be volatile anyway, we didn’t really mind if this particular table wouldn’t survive a server restart.

According to the documentation, Memory tables have the same table-level locking weakness as MyISAM. Nevertheless, when we switched our session table to the Memory storage engine, our database server’s CPU usage dropped to virtually zero!

Memory tables have some harsh restrictions that mean they aren’t a cure-all for performance issues. Disappearing data aside, Memory tables cannot contain large column types like TEXT or BLOB. Indeed, the largest piece of data you can store in a Memory table row is a 255 character CHAR or VARCHAR. In our case, we actually had to shorten a couple of our session variable names to make sure they would fit!

At this point, our session table is gliding along happily under moderate load. As the traffic from this article and some other promotion we have planned hits the server, we will be watching closely to decide if the Memory storage engine is up to the task, or if we should switch back to InnoDB to take advantage of row-level locking. In either case, we learned an important lesson: when performance counts, take the time to think about you choice of MySQL storage engine. In our case, it took us from a swamped database server to one showing zero CPU usage.

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • http://www.krues8dr.com krues8dr

    It’s worth noting that forgoing MyISAM means that you lose the ability to Fulltext Index, which is a big benefit on a lot of search-intensive data structures. So it’s not simply a performance question in the end – there are functional benefits as well. You also want to make sure you’re asking the right question in the first place; for heavy sessions like this, you’d do well to make use of memcache, rather than a write-heavy table. And storing your whole session serialized in a longtext field won’t do you any favors either.

  • http://www.deanclatworthy.com Dean C

    I’d be more inclined to use MongoDB for something like this:
    http://blog.mongodb.org/post/172254834/mongodb-is-fantastic-for-logging

    Nevertheless, good blog post for those who don’t understand the significance of MySQL table types in your applications performance.

    I would highly reccomend this book to any of your readers wanting to truly master MySQL:
    http://www.amazon.com/High-Performance-MySQL-Jeremy-Zawodny/dp/0596003064

  • redbrick

    What about mamcached? I think you should give memcache give a try for sessions. Just like memory tables the data is stored in the memory. There are several more nice features which should make memcache the better choice.

  • Dajve

    I have to second Dean’s suggestion: High Performance MySQL was a godsend when I was optimising a database-intensive app.

    Also, it’s worth investigating memcached if you have sufficient access to your server(s):
    http://www.danga.com/memcached/

  • Stefan

    For a fast session solution i use memcache, which fits perfect for multi server environments. PHP can handle it easily and you can set up several servers which handle all the sessions.

  • Clintonio

    I’ve been using InnoDB for a while for the same reason.

    It’s a very good idea really.

  • sathish kumar

    Yes, Give a try to memcached , Because it will satisfy most of the problems such as,

    1) session storing in multiple servers, frequent access of session data,
    2) no restriction of text limitation,
    3) you can store what ever you want(no question about text, picture, object and etc.,)
    and
    4) Its also using memory directly so, it will be damn fast compared to memory table.

    Thanks,
    Sathish Kumar

  • dj4ster

    Hi Kevin,

    Thanks for this very informative insights. I’m new to this “area” of, what I call, MySQL Performance Troubleshooting.

    Enjoyed your methodical approach and reasoning behind the actions. Thanks again. Don

  • Michał Fikus

    This entry is great – thanks!
    What more you can do to optimize performance (and wasn’t already written in comments) is to change column types – SESS_ID is almost always 32, so choose CHAR rather then VARCHAR and ask yourself if you really need LONGTEXT (I have never needed more then MEDIUMTEXT provide).

  • http://www.perlgen.com/ dgs20904

    There is a relatively straight forward way to architect MySQL with MyISAM tables to handle very high volumes of transactions if you are using MySQL replication. Under replication, you setup a master database on one server that handles all of your updates, and you use the slave databases that are running on other servers for all of your reads (SELECT’s).

    It is very easy to build an API that will allow you to build this functionality into your application from the get go. This is what we do:

    1. Let’s say that we have four database servers,
    with one master and three slaves. We first define a name in DNS that references all four of the servers — lets call it dc01.example.com for “database cluster 1″. Using the standard BIND DNS server, this definition would look something like this:

    dc01 1d A 192.168.1.10 ; master
    dc01 1d A 192.168.1.11 ; slave 1
    dc01 1d A 192.168.1.12 ; slave 2
    dc01 1d A 192.168.1.13 ; slave 3

    2. Next we add an entry to the DNS definitions that identify each of the servers by their purpose (update or read). We use the letter “u” for the update server, and the letter “r” for the read-only servers. Then we create more DNS entries that look like this:

    u.dc01 1d A 192.168.1.10 ; master
    r.dc01 1d A 192.168.1.11 ; slave 1
    r.dc01 1d A 192.168.1.12 ; slave 2
    r.dc01 1d A 192.168.1.13 ; slave 3

    Now, if we refer to “u.dc01.example.com”, we get the master update server, and if we refer to “r.dc01.example.com” we get one of the three read only servers (in round robit fashion to balance the load).

    3. This is where the API comes in. All you really need to do is create a database connection API that specifies how you want to connect to the server. Use connection type “u” to get an update handle, or connection type “r” to get a read-only handle. When you call the API, it simply prefixes the database host name with the character you specify, makes the connection, and passes back the handle to the database instance. You can use this to then build SQL queries or issues updates. For example, your API could look like this when it is called:

    # get an update handle to the database
    $dbh = ConnectDatabase (“u”);

    In the ConnectDatabase() function, you would simply build a normal MySQL connect call, but you would prefix the host name to connect to with the value that was passed (the “u”) and a period (e.g., u.dc01.example.com). Simple, eh?

    This method works really well because it can be scaled as big as you want — simply add slaves and DNS entries and your code continues to work. And in the event you have a small application you are developing locally on a single database server, you can point the “u” and “r” entries to the same server and your code works with a single database — ready for scaling when you bring it out into the larger Internet.

    There are some efficiencies that you will want to incorporate into your API, like storing the handle of the database in an array based on its connection type when multiple connections are made within the same web program (if you find an exisint connection has been made, just return the saved handle). I can provide examples of this in Perl if anyone is interested. It is very easy to do, easy to build into your applications, and makes the issue if scaling trivial. Plus, if you move to NDB clustering, the code can work there too — simply set it up as if all the servers had both update and read only capabilities. Best Regards,

    Dave Stoddard – Accelix LLC
    dgs at accelix.com

  • Dimas

    Great article, just got it in my inbox and read the full thing. I must say, it is very well written for the beginner/intermediate user who may have not considered performance implications too seriously.

  • palgrave

    As someone who usually approaches the tech times with the question “I wonder will I understand this one?” this was great stuff.

    An excellent explanation of table types and a lot of stuff de-mystified. Loved it.

  • Sloami

    @dgs20904, that solution brings in additional overhead. You’ll have to keep the replication lag between the Master DB and the Slaves to as minimum as possible. Possibly monitoring heartbeats of the servers and know when to take off a dead or stale server. But anyway still a good solution.

  • preetam

    Well I am thankful to you to remind me this. Though knowing it from long time, i used to leave things to MyISAM.

    Never bothered for performance.

    But surely i will take care of it next time… :)

  • Melug

    Nice one,

  • Cesar Quinteros

    Good to know! I am still learning web development and this is will definitely get an asterisk in the back of my mind!

  • Anonymous

    Thanks for the tip… makes my life much easier if I have to figure it out myself. I wish I could treat you to a restaurant of your choice.

  • http://www.dangrossman.info Dan Grossman

    I just set the PHP session save path to a ramdisk. Session data in memory without the overhead of a RDBMS.

  • yonah

    This is very valuable advise priced at no fee! Thanks for for showing me just another kind of dealing with performance issues!

  • http://litlurl.net pkSML

    Thanks for this article! I would consider myself a MySQL beginner. I completely understood the article, as Sitepoint does so well in books and other articles! Keep ‘em coming!

  • Janet

    Great article.I found it really interesting also you explain everything is such a simple words that is easily understandable by everyone even don’t have much knowledge about this.
    Janet

  • http://www.tvlocales-depays.com/ Arsento

    Valuable thoughts and advices. I read your topic with great interest.