MySQL 3 to 4 Migration in the Real World
There are all manner of whitepapers and architecture documents on the features and benefits of migrating from MySQL 3 to MySQL 4. Sometimes, the best case is to see it done on the front lines of a real Web business.
I recently had a chance to chat with Tyson Lowery, a developer and proprietor of online fantasy baseball site SimDynasty.com and learn about his experience in making the MySQL move, marketing the site, and his views on Java and PHP.
Can you give us a little background on yourself?
I grew up in Akron, Ohio before attending Boston College where I majored in Computer Science and Business. I started work for GE Capital (a division of General Electric) after that, where I worked on a variety of projects. Most of the technology was Microsoft variety, with some Java and Oracle projects sprinkled here and there.
I started my own technology consulting practice, Teeloh Technology, after leaving GE. My specialty has become database-driven Web applications, particularly in the Financial Services and Real Estate industries. But as of late, I’ve been doing some project work for gaming companies as well.
What drove you to design and launch the fantasy baseball site?
I really became attracted to some of the open source technologies being developed, particularly Java and MySQL. I started my baseball simulation site, SimDynasty.com, in the evenings while working at GE mostly as a way to experiment with Java and MySQL. Little did I know that it would grow to become a big hit.
Can you expand on the site’s technology platform?
Sure, but first, let me tell you a little bit about the site, so you can follow the examples and understand how we use MySQL a little bit better. Sim Dynasty is a baseball simulation that challenges people in the team ownership and manager aspects of baseball. Each team owner drafts a team of fictitious baseball players and acts as the owner and manager of the team. They can trade players, set lineups, groom prospects in the minor leagues, etc.
Games are simulated 3 times a day and you can watch the games live, similar to ESPN GameCast. The coolest part about the game is that you manage the team over several seasons and have to decide which player to select in the Amateur Draft, or which rookie to call up to replace the veteran that retired in the off-season, or whether to sacrifice the future of your team by trading a few prospects for a top of the line pitcher.
You can imagine that, when the games are simulated, it causes a large number of records to be updated and inserted into the database to account for player stats, team records, and the play-by-play. But the users are still banging away at the site with reads to check stats, view game results, and looking over the waiver wire. Users are also doing their share of database writes with setting lineups, offering trades, and even changing players’ names. So you can see we have a lot of different competitors for MySQL’s resources, both on the read and write sides.
The site runs using Java, JSP, and MySQL 4.0 on a Linux server — we do use PHP a little bit for a forum and a couple other small areas. We are currently on MySQL 4.0.18. I can’t wait until 4.1 becomes production-ready — subselects are finally here!
When you started the site, it sounds like you were on MySQL 3.2x? How did you plan the migration to 4.x?
Probably like a lot of people out there, we made the change out of necessity. We desperately needed some features in 4.0 to help the performance of the site. We started with 3.23 simply because that was the version that came pre-installed on the server and it worked well for a while.
What factors led you to make the decision to migrate?
The major reason we upgraded was so that InnoDB could be leveraged. I noticed that pages were really slow, and realized the problem when monitoring the number of locked queries in SHOW STATUS.
We have an interesting situation that most sites don’t experience. A lot of MySQL sites have a ton of Select queries, but not many Inserts, Deletes, or Updates. Or, if they do have those, they’re done during off-peak hours.
The nature of our site has just about every table being updated hundreds to thousands of times an hour. This made getting to 4.0 critical as our site grew so that we could leverage InnoDB tables. Using InnoDB for the frequently updated tables has been a game changer for us because it uses Row level locking instead of Table locking when performing updates. This allows our Java programs and JSP pages to update tables while our visitors view data from the tables, or even update the data.
Can you share some MySQL tips and tricks discovered in the process?
Some of the tricks I’ve learned aren’t intuitive. If you’re using joins between two tables, performance is greatly improved if the sizes of the columns are the same. For example, if you have one table called customer with a primary key of
customer_id that’s joined with an order table on a field called
order_customer_id, make sure you don’t have something like
int(10) for one field, and
int(5) for the other. Changing them both to
int(10), for example, will help the speed of these joins.
The other feature we have leveraged in 4.0 is query caching. Even though the tables are updated many times an hour, we still see good numbers in terms of cached queries being used.
I recommend that you turn on the slow query log for MySQL. This was a good starting point to identify which queries were running slowly. When I do find a slow query, I use MySQL’s Analyze query that shows you how many rows are being read from each table and how they’re being joined. You can usually find a better way to get the same result set without scanning so many rows, either by adding indexes on tables, or by joining the query differently.
Have there been any hiccups or lessons learned along the way?
Deciding which tables to move to InnoDB from MyISAM is an ongoing process. MyISAM tends to perform better on reads — if you can avoid table-wide locking. Our main database has 34 tables, 7 of which are InnoDB tables now.
There are a few other tables that may be candidates to be changed over in the future. We monitor queries to see if locking becomes an issue with any of them. A nice tool that we’ve found is called mytop. It kind of takes what you’d see in MySQL from SHOW STATUS and a few other commands, and presents it in a format similar to the Linux top command. You can see the queries being executed right in front of you — you can see how long they’re taking, and whether any queries are locked.
Another wrinkle we’ve noticed is that, in some instances, for a few days every month or so, a table may experience a period in which it’s really getting hit hard with updates. For example, the playoffs for all our leagues happen around the same time. So there are certain tables that will get read and written quite frequently, but this only lasts for about 3 days and it only happens every other month. During this time, we actually switch the table to InnoDB to leverage the row-level locking, and revert it to MyISAM the rest of the time — when it’s generally not written to.
The key to the whole process is trial and error. Change a setting slightly or change the index, test and measure the difference, and then decide whether to keep the change or discard it. It can be a tedious process, but the results have been amazing. A few people tried to tell me to invest in a bigger server, but I’m not sure how much that would have alleviated our problems.
How about results? Have you tracked any metrics on performance since migrating?
Sorry, I wish I had some numbers. I can just say from a customer standpoint that the site flies. It used to be so slow when we ran our simulation games during the day that we had to run the free games overnight so that the rest of the site wasn’t overly slow. This is no longer a problem.
The number of players we have playing the game has gone through the roof, and I think a lot of this has to do with better server performance. Sometimes, certain pages could take up to 10 seconds to load in the past — this can get tiresome for a person to deal with day-in and day-out and I think we lost a lot of people because of that.
You use Java and PHP, is that correct? Do you have any advice or are there any techniques that you’ve picked up using Java on MySQL in contrast to PHP?
Both JSP and PHP have their advantages and disadvantages. If you’re going to need programs to run on the server, my philosophy was that you should use a J2EE solution so that you can share code between your executable programs and the Servlets or JSP. But PHP generally puts less stress on a server than Tomcat, and there seem to be fewer moving parts with PHP in terms of server setup and configuration — meaning that fewer things can go wrong.
In terms of interfacing with MySQL, I haven’t noticed many differences in terms of performance. PHP is slightly less complex for beginners to figure out how to connect to a database. But in terms of performance, I haven’t seen much of a difference. It would be an interesting experiment to execute the same queries using PHP and JSP and measure the results — maybe something I can do on a rainy day.
How have you marketed the site? Has it resulted in more subscribers?
The site has primarily been marketed using paid Search Engine listings such as Google Adwords and Overture. We also advertise on other baseball related sites. We normally get 75,000 to 100,000 page views a day on our site, with nearly 25,000 registered users.
We have a refer-a-friend program as well, so news of the site spreads quite a bit through word-of-mouth.
Have you discovered any other valuable tools like mytop in the process of building and maintaining the site?
We started out using some monitoring scripts on the server to let us know when there were site problems. But one of the problems we found with this was that if there were a major issue, the scripts would stop running as well. We now use a company called Alerta to monitor our sites. For a few bucks a month, they page us if there is a problem with the site being down.
Webmasterworld.com is a catch-all Website with great tips on anything from finding a hosting company, to how to market your site, to HTML syntax questions. It’s really a valuable website if you haven’t seen it before.
For marketing, using a banner ad exchange may seem like you’re back in 1996, but it really has helped us bring in new customers and get rid of our extra banner ad inventory. After experimenting with a number of these, we exclusively use BCentral.com’s exchange that is now owned by Microsoft.
SitePoint thanks Tyson for his time. Check out SimDynasty.com.