MySQL Performance Boosting with Indexes and Explain

Claudio Ribeiro
Claudio Ribeiro
Share

Techniques to improve application performance can come from a lot of different places, but normally the first thing we look at — the most common bottleneck — is the database. Can it be improved? How can we measure and understand what needs and can be improved?

One very simple yet very useful tool is query profiling. Enabling profiling is a simple way to get a more accurate time estimate of running a query. This is a two-step process. First, we have to enable profiling. Then, we call show profiles to actually get the query running time.

Let’s imagine we have the following insert in our database (and let’s assume User 1 and Gallery 1 are already created):

INSERT INTO `homestead`.`images` (`id`, `gallery_id`, `original_filename`, `filename`, `description`) VALUES
(1, 1, 'me.jpg', 'me.jpg', 'A photo of me walking down the street'),
(2, 1, 'dog.jpg', 'dog.jpg', 'A photo of my dog on the street'),
(3, 1, 'cat.jpg', 'cat.jpg', 'A photo of my cat walking down the street'),
(4, 1, 'purr.jpg', 'purr.jpg', 'A photo of my cat purring');    

Obviously, this amount of data will not cause any trouble, but let’s use it to do a simple profile. Let’s consider the following query:

SELECT * FROM `homestead`.`images` AS i
WHERE i.description LIKE '%street%';

This query is a good example of one that can become problematic in the future if we get a lot of photo entries.

To get an accurate running time on this query, we would use the following SQL:

set profiling = 1;
SELECT * FROM `homestead`.`images` AS i
WHERE i.description LIKE '%street%';
show profiles;

The result would look like the following:

Query_Id Duration Query
1 0.00016950 SHOW WARNINGS
2 0.00039200 SELECT * FROM homestead.images AS i \nWHERE i.description LIKE \’%street%\’\nLIMIT 0, 1000
3 0.00037600 SHOW KEYS FROM homestead.images
4 0.00034625 SHOW DATABASES LIKE \’homestead\
5 0.00027600 SHOW TABLES FROM homestead LIKE \’images\’
6 0.00024950 SELECT * FROM homestead.images WHERE 0=1
7 0.00104300 SHOW FULL COLUMNS FROM homestead.images LIKE \’id\’

As we can see, the show profiles; command gives us times not only for the original query but also for all the other queries that are made. This way we can accurately profile our queries.

But how can we actually improve them?

We can either rely on our knowledge of SQL and improvise, or we can rely on the MySQL explain command and improve our query performance based on actual information.

Explain is used to obtain a query execution plan, or how MySQL will execute our query. It works with SELECT, DELETE, INSERT, REPLACE, and UPDATE statements, and it displays information from the optimizer about the statement execution plan. The official documentation does a pretty good job of describing how explain can help us:

With the help of EXPLAIN, you can see where you should add indexes to tables so that the statement executes faster by using indexes to find rows. You can also use EXPLAIN to check whether the optimizer joins the tables in an optimal order.

To exemplify the usage of explain, we’ll use the query made by our UserManager.php to find a user by email:

SELECT * FROM `homestead`.`users` WHERE email = 'claudio.ribeiro@examplemail.com';

To use the explain command, we simply prepend it before select type queries:

EXPLAIN SELECT * FROM `homestead`.`users` WHERE email = 'claudio.ribeiro@examplemail.com';

This is the result (scroll right to see all):

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE ‘users’ NULL ‘const’ ‘UNIQ_1483A5E9E7927C74’ ‘UNIQ_1483A5E9E7927C74’ ‘182’ ‘const’ 100.00 NULL

These results are not easy to understand at first sight, so let’s take a closer look at each one of them:

  • id: this is just the sequential identifier for each of the queries within the SELECT.

  • select_type: the type of SELECT query. This field can take a number of different values, so we will focus on the most important ones:

    • SIMPLE: a simple query without subqueries or unions
    • PRIMARY: the select is in the outermost query in a join
    • DERIVED: the select is a part of a subquery within a from
    • SUBQUERY: the first select in a subquery
    • UNION: the select is the second or later statement of a union.

    The full list of values that can appear in a select_type field can be found here.

  • table: the table referred to by the row.

  • type: this field is how MySQL joins the tables used. This is probably the most important field in the explain output. It can indicate missing indexes and it can also show how the query should be rewritten. The possible values for this field are the following (ordered from the best type to the worst):

    • system: the table has zero or one row.
    • const: the table has only one matching row which is indexed. The is the fastest type of join.
    • eq_ref: all parts of the index are being used by the join and the index is either PRIMARY_KEY or UNIQUE NOT NULL.
    • ref: all the matching rows of an index column are read for each combination of rows from the previous table. This type of join normally appears for indexed columns compared with = or <=> operators.
    • fulltext: the join uses the table FULLTEXT index.
    • ref_or_null: this is the same as ref but also contains rows with a NULL value from the column.
    • index_merge: the join uses a list of indexes to produce the result set. The KEY column of the explain will contain the keys used.
    • unique_subquery: an IN subquery returns only one result from the table and makes use of the primary key.
    • range: an index is used to find matching rows in a specific range.
    • index: the entire index tree is scanned to find matching rows.
    • all: the entire table is scanned to find matching rows for the join. This is the worst type of join and often indicates the lack of appropriate indexes on the table.
  • possible_keys: shows the keys that can be used by MySQL to find rows from the table. These keys may or may not be used in practice.

  • keys: indicates the actual index used by MySQL. MySQL always looks for an optimal key that can be used for the query. While joining many tables, it may figure out some other keys which are not listed in possible_keys but are more optimal.

  • key_len: indicates the length of the index the query optimizer chose to use.

  • ref: Shows the columns or constants that are compared to the index named in the key column.

  • rows: lists the number of records that were examined to produce the output. This is a very important indicator; the fewer records examined, the better.

  • Extra: contains additional information. Values such as Using filesort or Using temporary in this column may indicate a troublesome query.

The full documentation on the explain output format may be found on the official MySQL page.

Going back to our simple query: it is a SIMPLE type of select with a const type of join. This is the best case of query we can possibly have. But what happens when we need bigger and more complex queries?

Going back to our application schema, we might want to obtain all gallery images. We also might want to have only photos that contain the word “cat” in the description. This is definitely a case that we could find on the project requirements. Let’s take a look at the query:

SELECT gal.name, gal.description, img.filename, img.description FROM `homestead`.`users` AS users
LEFT JOIN `homestead`.`galleries` AS gal ON users.id = gal.user_id
LEFT JOIN `homestead`.`images` AS img on img.gallery_id = gal.id
WHERE img.description LIKE '%dog%';

In this more complex case we should have some more information to analyze on our explain:

EXPLAIN SELECT gal.name, gal.description, img.filename, img.description FROM `homestead`.`users` AS users
LEFT JOIN `homestead`.`galleries` AS gal ON users.id = gal.user_id
LEFT JOIN `homestead`.`images` AS img on img.gallery_id = gal.id
WHERE img.description LIKE '%dog%';

This gives the following results (scroll right to see all cells):

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE ‘users’ NULL ‘index’ ‘PRIMARY,UNIQ_1483A5E9BF396750’ ‘UNIQ_1483A5E9BF396750’ ‘108’ NULL 100.00 ‘Using index’
1 SIMPLE ‘gal’ NULL ‘ref’ ‘PRIMARY,UNIQ_F70E6EB7BF396750,IDX_F70E6EB7A76ED395’ ‘UNIQ_1483A5E9BF396750’ ‘108’ ‘homestead.users.id’ 100.00 NULL
1 SIMPLE ‘img’ NULL ‘ref’ ‘IDX_E01FBE6A4E7AF8F’ ‘IDX_E01FBE6A4E7AF8F’ ‘109’ ‘homestead.gal.id’ ‘25.00’ ‘Using where’

Let’s take a closer look and see what we can improve in our query.

As we saw earlier, the main columns we should look at first are the type column and the rows columns. The goal should get a better value in the type column and reduce as much as we can on the rows column.

Our result on the first query is index, which is not a good result at all. This means we can probably improve it.

Looking at our query, there are two ways of approaching it. First, the Users table is not being used. We either expand the query to make sure we’re targeting users, or we should completely remove the users part of the query. It is only adding complexity and time to our overall performance.

SELECT gal.name, gal.description, img.filename, img.description FROM `homestead`.`galleries` AS gal
LEFT JOIN `homestead`.`images` AS img on img.gallery_id = gal.id
WHERE img.description LIKE '%dog%';

So now we have the exact same result. Let’s take a look at explain:

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE ‘gal’ NULL ‘ALL’ ‘PRIMARY,UNIQ_1483A5E9BF396750’ NULL NULL NULL 100.00 NULL
1 SIMPLE ‘img’ NULL ‘ref’ ‘IDX_E01FBE6A4E7AF8F’ ‘IDX_E01FBE6A4E7AF8F’ ‘109’ ‘homestead.gal.id’ ‘25.00’ ‘Using where’

We are left with an ALL on type. While ALL might be the worst type of join possible, there are also times where it’s the only option. According to our requirements, we want all gallery images, so we need to scour through the whole galleries table. While indexes are really good when trying to find particular information on a table, they can’t help us when we need all the information in it. When we have a case like this, we have to resort to a different method, like caching.

One last improvement we can make, since we’re dealing with a LIKE, is to add a FULLTEXT index to our description field. This way, we could change the LIKE to a match() and improve performance. More on full-text indexes can be found here.

There are also two very interesting cases we must look at: the newest and related functionality in our application. These apply to galleries and touch on some corner cases that we should be aware of:

EXPLAIN SELECT * FROM `homestead`.`galleries` AS gal
LEFT JOIN `homestead`.`users` AS u ON u.id = gal.user_id
WHERE u.id = 1
ORDER BY gal.created_at DESC
LIMIT 5;

The above is for the related galleries.

EXPLAIN SELECT * FROM `homestead`.`galleries` AS gal
ORDER BY gal.created_at DESC
LIMIT 5;

The above is for the newest galleries.

At first sight, these queries should be blazing fast because they’re using LIMIT. And that is the case on most queries using LIMIT. Unfortunately for us and our application, these queries are also using ORDER BY. Because we need to order all the results before limiting the query, we lose the advantages of using LIMIT.

Since we know ORDER BY might be tricky, let’s apply our trusty explain.

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE ‘gal’ NULL ‘ALL’ ‘IDX_F70E6EB7A76ED395’ NULL NULL NULL 100.00 ‘Using where; Using filesort’
1 SIMPLE ‘u’ NULL ‘eq_ref’ ‘PRIMARY,UNIQ_1483A5E9BF396750’ ‘PRIMARY ‘108’ ‘homestead.gal.id’ ‘100.00’ NULL

And,

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE ‘gal’ NULL ‘ALL’ NULL NULL NULL NULL 100.00 ‘Using filesort’

As we can see, we have the worst case of join type: ALL for both of our queries.

Historically, MySQL’s ORDER BY implementation, especially together with LIMIT, is often the cause for MySQL performance problems. This combination is also used in most interactive applications with large datasets. Functionalities like newly registered users and top tags normally use this combination.

Because this is a common problem, there’s also a small list of common solutions we should apply to take care of performance issues.

  • Make sure we’re using indexes. In our case, created_at is a great candidate, since it’s the field we’re ordering by. This way, we have both ORDER BY and LIMIT executed without scanning and sorting the full result set.
  • Sort by a column in the leading table. Normally, if the ORDER BY is going by field from the table which is not the first in the join order, then the index can’t be used.
  • Don’t sort by expressions. Expressions and functions don’t allow index usage by ORDER BY.
  • Beware of a large LIMIT value. Large LIMIT values will force ORDER BY to sort a bigger number of rows. This affects performance.

These are some of the measures we should take when we have both LIMIT and ORDER BY in order to minimize performance issues.

Conclusion

As we can see, explain can be very useful for spotting problems in our queries early on. There are a lot of problems that we only notice when our applications are in production and have big amounts of data or a lot of visitors hitting the database. If these things can be spotted early on using explain, there’s much less room for performance problems in the future.

Our application has all the indexes it needs, and it’s pretty fast, but we now know we can always resort to explain and indexes whenever we need to check for performance boosts.

Frequently Asked Questions (FAQs) about MySQL Performance Indexes

What is the importance of MySQL performance indexes?

MySQL performance indexes are crucial for optimizing database performance. They significantly speed up data retrieval operations by providing swift access to rows in data tables, based on the values within the indexed columns. Without indexes, MySQL would have to go through every row in the table to find the relevant rows, which can be time-consuming, especially for large databases.

How does the EXPLAIN command help in MySQL performance?

The EXPLAIN command in MySQL is a powerful tool that provides information about how MySQL executes queries. It shows the order in which tables are read, the type of read operation performed, possible indexes to choose, and the estimated number of rows to examine. This information can help developers optimize their queries and improve database performance.

Why isn’t MySQL using any of the possible keys?

There could be several reasons why MySQL isn’t using any of the possible keys. One reason could be that the optimizer estimates that using the index would require scanning a large percentage of the table, and decides that a table scan would be faster. Another reason could be that the columns in the WHERE clause do not match the columns in the index.

How can I optimize my MySQL queries?

There are several ways to optimize MySQL queries. One way is to use indexes effectively. Indexes can significantly speed up data retrieval. However, they can slow down data modification operations like INSERT, UPDATE, and DELETE. Therefore, it’s important to find a balance. Another way is to use the EXPLAIN command to understand how MySQL executes your queries and find potential bottlenecks.

What is the difference between a primary key and an index in MySQL?

A primary key is a type of index in MySQL. The primary key is a unique identifier for a row in a table. It enforces the uniqueness of the column or column combination and ensures that the column or column combination contains no NULL values. On the other hand, an index is a data structure that improves the speed of data retrieval operations. It can be applied to any column or combination of columns.

How can I create an index in MySQL?

You can create an index in MySQL using the CREATE INDEX statement. The syntax is as follows: CREATE INDEX index_name ON table_name (column1, column2, …);. This will create an index on the specified columns of the specified table.

What is a composite index in MySQL?

A composite index, also known as a multiple-column index, is an index that includes more than one column. In MySQL, you can include up to 16 columns in a composite index, but the total size of the indexed columns cannot exceed 767 bytes.

How can I drop an index in MySQL?

You can drop an index in MySQL using the DROP INDEX statement. The syntax is as follows: DROP INDEX index_name ON table_name;. This will drop the specified index from the specified table.

What is the difference between a clustered index and a non-clustered index in MySQL?

A clustered index determines the physical order of data in a table. Each table can have only one clustered index. A non-clustered index, on the other hand, does not alter the physical order of data in the table. Instead, it maintains a separate data structure (the index) that points to the data rows, allowing faster retrieval of data.

How does MySQL choose an index to use?

MySQL uses a cost-based optimizer to choose which index to use. The optimizer estimates the cost of different query execution plans and chooses the one with the lowest cost. The cost is estimated based on factors like the number of rows to read, the number of disk seeks, the CPU cost, and the memory usage.