Techniques to improve application performance can come from a lot of different places, but normally the first thing we look at — the most common bottleneck — is the database. Can it be improved? How can we measure and understand what needs and can be improved?
One very simple yet very useful tool is query profiling. Enabling profiling is a simple way to get a more accurate time estimate of running a query. This is a two-step process. First, we have to enable profiling. Then, we call
show profiles to actually get the query running time.
Let’s imagine we have the following insert in our database (and let’s assume User 1 and Gallery 1 are already created):
INSERT INTO `homestead`.`images` (`id`, `gallery_id`, `original_filename`, `filename`, `description`) VALUES (1, 1, 'me.jpg', 'me.jpg', 'A photo of me walking down the street'), (2, 1, 'dog.jpg', 'dog.jpg', 'A photo of my dog on the street'), (3, 1, 'cat.jpg', 'cat.jpg', 'A photo of my cat walking down the street'), (4, 1, 'purr.jpg', 'purr.jpg', 'A photo of my cat purring');
Obviously, this amount of data will not cause any trouble, but let’s use it to do a simple profile. Let’s consider the following query:
SELECT * FROM `homestead`.`images` AS i WHERE i.description LIKE '%street%';
This query is a good example of one that can become problematic in the future if we get a lot of photo entries.
To get an accurate running time on this query, we would use the following SQL:
set profiling = 1; SELECT * FROM `homestead`.`images` AS i WHERE i.description LIKE '%street%'; show profiles;
The result would look like the following:
|2||0.00039200||SELECT * FROM
|3||0.00037600||SHOW KEYS FROM
|4||0.00034625||SHOW DATABASES LIKE \’homestead\|
|5||0.00027600||SHOW TABLES FROM
|6||0.00024950||SELECT * FROM
|7||0.00104300||SHOW FULL COLUMNS FROM
As we can see, the
show profiles; command gives us times not only for the original query but also for all the other queries that are made. This way we can accurately profile our queries.
But how can we actually improve them?
We can either rely on our knowledge of SQL and improvise, or we can rely on the MySQL
explain command and improve our query performance based on actual information.
Explain is used to obtain a query execution plan, or how MySQL will execute our query. It works with
UPDATE statements, and it displays information from the optimizer about the statement execution plan. The official documentation does a pretty good job of describing how
explain can help us:
With the help of EXPLAIN, you can see where you should add indexes to tables so that the statement executes faster by using indexes to find rows. You can also use EXPLAIN to check whether the optimizer joins the tables in an optimal order.
To exemplify the usage of
explain, we’ll use the query made by our
UserManager.php to find a user by email:
SELECT * FROM `homestead`.`users` WHERE email = 'email@example.com';
To use the
explain command, we simply prepend it before select type queries:
EXPLAIN SELECT * FROM `homestead`.`users` WHERE email = 'firstname.lastname@example.org';
This is the result (scroll right to see all):
These results are not easy to understand at first sight, so let’s take a closer look at each one of them:
id: this is just the sequential identifier for each of the queries within the SELECT.
select_type: the type of SELECT query. This field can take a number of different values, so we will focus on the most important ones:
SIMPLE: a simple query without subqueries or unions
PRIMARY: the select is in the outermost query in a join
DERIVED: the select is a part of a subquery within a from
SUBQUERY: the first select in a subquery
UNION: the select is the second or later statement of a union.
The full list of values that can appear in a
select_typefield can be found here.
table: the table referred to by the row.
type: this field is how MySQL joins the tables used. This is probably the most important field in the explain output. It can indicate missing indexes and it can also show how the query should be rewritten. The possible values for this field are the following (ordered from the best type to the worst):
system: the table has zero or one row.
const: the table has only one matching row which is indexed. The is the fastest type of join.
eq_ref: all parts of the index are being used by the join and the index is either PRIMARY_KEY or UNIQUE NOT NULL.
ref: all the matching rows of an index column are read for each combination of rows from the previous table. This type of join normally appears for indexed columns compared with
fulltext: the join uses the table FULLTEXT index.
ref_or_null: this is the same as ref but also contains rows with a NULL value from the column.
index_merge: the join uses a list of indexes to produce the result set. The KEY column of the
explainwill contain the keys used.
unique_subquery: an IN subquery returns only one result from the table and makes use of the primary key.
range: an index is used to find matching rows in a specific range.
index: the entire index tree is scanned to find matching rows.
all: the entire table is scanned to find matching rows for the join. This is the worst type of join and often indicates the lack of appropriate indexes on the table.
possible_keys: shows the keys that can be used by MySQL to find rows from the table. These keys may or may not be used in practice.
keys: indicates the actual index used by MySQL. MySQL always looks for an optimal key that can be used for the query. While joining many tables, it may figure out some other keys which are not listed in
possible_keysbut are more optimal.
key_len: indicates the length of the index the query optimizer chose to use.
ref: Shows the columns or constants that are compared to the index named in the key column.
rows: lists the number of records that were examined to produce the output. This is a very important indicator; the fewer records examined, the better.
Extra: contains additional information. Values such as
Using temporaryin this column may indicate a troublesome query.
The full documentation on the
explain output format may be found on the official MySQL page.
Going back to our simple query: it is a
SIMPLE type of select with a const type of join. This is the best case of query we can possibly have. But what happens when we need bigger and more complex queries?
Going back to our application schema, we might want to obtain all gallery images. We also might want to have only photos that contain the word “cat” in the description. This is definitely a case that we could find on the project requirements. Let’s take a look at the query:
SELECT gal.name, gal.description, img.filename, img.description FROM `homestead`.`users` AS users LEFT JOIN `homestead`.`galleries` AS gal ON users.id = gal.user_id LEFT JOIN `homestead`.`images` AS img on img.gallery_id = gal.id WHERE img.description LIKE '%dog%';
In this more complex case we should have some more information to analyze on our
EXPLAIN SELECT gal.name, gal.description, img.filename, img.description FROM `homestead`.`users` AS users LEFT JOIN `homestead`.`galleries` AS gal ON users.id = gal.user_id LEFT JOIN `homestead`.`images` AS img on img.gallery_id = gal.id WHERE img.description LIKE '%dog%';
This gives the following results (scroll right to see all cells):
Let’s take a closer look and see what we can improve in our query.
As we saw earlier, the main columns we should look at first are the
type column and the
rows columns. The goal should get a better value in the
type column and reduce as much as we can on the
Our result on the first query is
index, which is not a good result at all. This means we can probably improve it.
Looking at our query, there are two ways of approaching it. First, the
Users table is not being used. We either expand the query to make sure we’re targeting users, or we should completely remove the
users part of the query. It is only adding complexity and time to our overall performance.
SELECT gal.name, gal.description, img.filename, img.description FROM `homestead`.`galleries` AS gal LEFT JOIN `homestead`.`images` AS img on img.gallery_id = gal.id WHERE img.description LIKE '%dog%';
So now we have the exact same result. Let’s take a look at
We are left with an
ALL on type. While
ALL might be the worst type of join possible, there are also times where it’s the only option. According to our requirements, we want all gallery images, so we need to scour through the whole galleries table. While indexes are really good when trying to find particular information on a table, they can’t help us when we need all the information in it. When we have a case like this, we have to resort to a different method, like caching.
One last improvement we can make, since we’re dealing with a
LIKE, is to add a FULLTEXT index to our description field. This way, we could change the
LIKE to a
match() and improve performance. More on full-text indexes can be found here.
There are also two very interesting cases we must look at: the
related functionality in our application. These apply to galleries and touch on some corner cases that we should be aware of:
EXPLAIN SELECT * FROM `homestead`.`galleries` AS gal LEFT JOIN `homestead`.`users` AS u ON u.id = gal.user_id WHERE u.id = 1 ORDER BY gal.created_at DESC LIMIT 5;
The above is for the related galleries.
EXPLAIN SELECT * FROM `homestead`.`galleries` AS gal ORDER BY gal.created_at DESC LIMIT 5;
The above is for the newest galleries.
At first sight, these queries should be blazing fast because they’re using
LIMIT. And that is the case on most queries using
LIMIT. Unfortunately for us and our application, these queries are also using
ORDER BY. Because we need to order all the results before limiting the query, we lose the advantages of using
Since we know
ORDER BY might be tricky, let’s apply our trusty
|1||SIMPLE||‘gal’||NULL||‘ALL’||‘IDX_F70E6EB7A76ED395’||NULL||NULL||NULL||100.00||‘Using where; Using filesort’|
As we can see, we have the worst case of join type:
ALL for both of our queries.
ORDER BY implementation, especially together with
LIMIT, is often the cause for MySQL performance problems. This combination is also used in most interactive applications with large datasets. Functionalities like newly registered users and top tags normally use this combination.
Because this is a common problem, there’s also a small list of common solutions we should apply to take care of performance issues.
- Make sure we’re using indexes. In our case,
created_atis a great candidate, since it’s the field we’re ordering by. This way, we have both
LIMITexecuted without scanning and sorting the full result set.
- Sort by a column in the leading table. Normally, if the
ORDER BYis going by field from the table which is not the first in the join order, then the index can’t be used.
- Don’t sort by expressions. Expressions and functions don’t allow index usage by
- Beware of a large
LIMITvalues will force
ORDER BYto sort a bigger number of rows. This affects performance.
These are some of the measures we should take when we have both
ORDER BY in order to minimize performance issues.
As we can see,
explain can be very useful for spotting problems in our queries early on. There are a lot of problems that we only notice when our applications are in production and have big amounts of data or a lot of visitors hitting the database. If these things can be spotted early on using
explain, there’s much less room for performance problems in the future.
Our application has all the indexes it needs, and it’s pretty fast, but we now know we can always resort to
explain and indexes whenever we need to check for performance boosts.