Optimizing MySQL Indexes

Tweet
This entry is part 1 of 3 in the series Optimizing MySQL

Optimizing MySQL

Introduction

MySQL is one of the most used databases in conjunction with PHP. Making sure that your MySQL databases are running at their best is one of the most important aspects you have to consider whenever your web application grows.

In this series of 3 standalone articles, we will have a look at how we can optimize our MySQL installation. We will take a look at which optimizations we can perform on our database, on our MySQL configuration and how we can find potential problems when MySQL is not performing well. We will be using mostly tools from the Percona Toolkit, to work with our database. This article will focus on improving our indexing.

An introduction to indexes

If you haven’t worked with MySQL indexes by now, you probably have a very small or a badly performing MySQL database. You can compare MySQL indexes with the index in a book. Within the index of a book, you can easily find the correct page that contains the subject you were looking for. If there weren’t any indexes, you had to go through the whole book, searching for pages that contain the subject.

As you can imagine, it’s way faster to search by an index than having to go through each page. Therefore, adding indexes to your database is in general speeding up your select queries. However, the index also has to be created and stored. So the update and insert queries will be slower and it will cost you a bit more disk space. In general, you won’t notice the difference with updating and inserting if you have indexed your table correctly and therefore it’s advisable to add indexes at the right locations.

Tables which only contain a few rows don’t really benefit from indexing. You can imagine that searching through 5 pages is not much slower then first going to the index, getting the page number and then opening that particular that page.

For more information on implementing indexes, please see our previous articles on the matter. Within this article we will continue to see how we can actually improve our current indexes.

Finding duplicate indexes

Duplicate indexes will not necessarily slow down your select queries. However, they can slow down your insert and update queries and can cost you more disk space. In general, it’s better to avoid having duplicate keys.

In the Percona Toolkit, we can find a tool called pt-duplicate-key-checker. This tool will analyze your database and return which tables contain duplicate keys. Imagine we have the following table:

CREATE TABLE `user` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `firstname` varchar(255) NOT NULL,
 `lastname` varchar(255) NOT NULL,
 PRIMARY KEY (`id`),
 KEY `id_firstname` (`id`,`firstname`),
 KEY `firstname` (`firstname`),
 KEY `firstname_2` (`firstname`),
 KEY `firstname_lastname_id` (`firstname`,`lastname`,`id`)
) ENGINE=InnoDB

Within this table, we can find 3 duplicate keys. Let’s run pt-duplicate-key-checker to see which keys are actually duplicate.

# ########################################################################
# user                                                        
# ########################################################################

# firstname_2 is a left-prefix of firstname_lastname_id
# Key definitions:
#   KEY `firstname_2` (`firstname`),
#   KEY `firstname_lastname_id` (`firstname`,`lastname`,`id`)
# Column types:
#      `firstname` varchar(255) not null
#     `lastname` varchar(255) not null
#     `id` int(11) not null auto_increment
# To remove this duplicate index, execute:
ALTER TABLE `user` DROP INDEX `firstname_2`;

# firstname is a left-prefix of firstname_lastname_id
# Key definitions:
#   KEY `firstname` (`firstname`),
#   KEY `firstname_lastname_id` (`firstname`,`lastname`,`id`)
# Column types:
#     `firstname` varchar(255) not null
#     `lastname` varchar(255) not null
#     `id` int(11) not null auto_increment
# To remove this duplicate index, execute:
ALTER TABLE `user` DROP INDEX `firstname`;

# Key firstname_lastname_id ends with a prefix of the clustered index
# Key definitions:
#   KEY `firstname_lastname_id` (`firstname`,`lastname`,`id`)
#   PRIMARY KEY (`id`),
# Column types:
#     `firstname` varchar(255) not null
#     `lastname` varchar(255) not null
#     `id` int(11) not null auto_increment
# To shorten this duplicate clustered index, execute:
ALTER TABLE `user` DROP INDEX `firstname_lastname_id`, ADD INDEX `firstname_lastname_id` (`firstname`,`lastname`);

# ########################################################################
# Summary of indexes                                                      
# ########################################################################

# Size Duplicate Indexes   1032
# Total Duplicate Indexes  3
# Total Indexes            5

Although the index firstname is a duplicate of firstname_2, they are both a duplicate of the firstname_lastname_id key. Why is that? It’s because firstname is a so called leftmost prefix of firstname_lastname_id. When you generate an index on 3 columns (A, B, C) you basically have 3 indexes covered:

  • A
  • A, B
  • A, B, C

So since firstname is the first column in the firstname_lastname_id index, it has no effect to create a separate index.

The last found index might be confusing. According to the tool, the id can be removed. In this case, it is because we are using the InnoDB engine. InnoDB uses a clustered index where the data for the row is stored. In general InnoDB will use the primary key as the clustered index. Therefore, InnoDB will already be using the index on that column and there is no need to add it to this index.

For more information:
Multiple column indexes
InnoDB index types

Finding unused indexes

Of course you want to avoid duplicate keys as much as possible, but how about unused keys? For that, Percona Toolkit contains a tool called pt-index-usage. Based on a query log, this tool will run an explain against all found queries. For that, you need to make sure you are logging queries in your slow log or general log for example.

Let’s run pt-index-usage /var/log/mysql/mysql-slow.log and see what the output is.

ALTER TABLE `user` DROP KEY `lastname_index`; -- type:non-unique

According to the explain that has been run, this index is not used – we could just remove the index. However, it might be that your application has one or more queries which are using the key, and the query using it simply hasn’t been logged yet. So I wouldn’t suggest to remove the keys without doing a manual check.

Also note that this tool can be quite resource hungry, because it’s running explains against each query. It is therefore not advisable to run this tool in a production environment.

Conclusion

Indexes can be very important for an optimal MySQL performance. Making sure that your indexes are in good shape is as important as adding them.

How do you make sure you have correct indexes within your database? I would love to hear your methods in the comments below.

Optimizing MySQL

Optimizing MySQL Configuration >>

Free JavaScript: Novice to Ninja Sample

Get a free 32-page chapter of JavaScript: Novice to Ninja and receive updates on exclusive offers from SitePoint.

  • http://www.itoctopus.com/ itoctopus

    H Peter,

    The “Introduction to Indexes” part is really great. The way you explained the indexes on A,B,C is spot-on.

    PS: Joomla sites suffer from slowdowns in inserts/updates because of duplicate indexes.

    • https://www.peternijssen.nl/ Peter Nijssen

      Thanks! Good to hear you like it. :)

  • Peter Mescalchin

    Fantastic post, pt-duplicate-key-checker is a great find. I’ll be making use of this for sure. The Percona tools are great, have been using Percona XtraBackup for non-locking MySQL off-site backups for years.

  • Taylor Ren

    Nice article. I like database articles, focusing on these fundamental stuffs.

  • hot_rush

    thanks, good article. write more!

    • https://www.peternijssen.nl/ Peter Nijssen

      Coming up later today and tomorrow ;)

  • gskema

    So if double index of (A, B) is the same as making two seperate indexes A and B, does a query with to wheres (WHERE A = 1 AND B = 2) still get the benefit of indexes?