LIMIT makes query 30 times faster?

Hello,

I had a mysql query in a php script that was taking too long to run, over 6 seconds usually.

I tried running that query in phpmyadmin and I noticed that it was running way faster. I also noticed that phpmyadmin was automatically appending “LIMIT 0,30” at the end of my query. Since the rows returned from this query are never more than 60, i decided to add a “LIMIT 0,60” at the end of my query in the php script. To my surprise this made the query (and the script) run 30 times faster!!

Can this be right or should I be missing something? I find it hard to believe that such kind of change can have such a dramatic effect. Actually I am surprised it can have any effect at all, since the rows returned were never more than 60 anyway, so the LIMIT clause seems totally redundant in this case.

I am running MySQL 4.1.22

The query includes “SELECT DISTINCT” 5 “LEFT JOIN”, “GROUP BY” and “ORDER BY” … if this matters at all.

Can you reproduce the 6 second time, consistently, when you remove the LIMIT?

The first thing that comes to mind when you write this is the query cache. If query caching is enabled, then after the first time you run a query, it’s going to be super fast each time you run it until it gets pushed out of cache for other queries or the table it’s hitting gets an update.

So the LIMIT wouldn’t have been the thing that sped it up, but the fact that you ran the query more than once. And if you flush the query cache, it’ll be just as slow with the LIMIT as it was without.

Can you reproduce the 6 second time, consistently, when you remove the LIMIT?

Yes. I added some timing in the PHP script (although is not needed, since the time difference is huge and obvious). Without the “LIMIT” the query takes about 6 seconds to run, sometimes 7. This is consistent no matter how many times I run the query. Then with the “LIMIT” the time drops to about 0.2 seconds, and this is again consistent no matter how many times I run the script!

I found this: http://dev.mysql.com/doc/refman/5.0/en/limit-optimization.html I don’t know if what happens to my query can be explained based on this.

Hmm, maybe.

Try running an EXPLAIN for each query and see if the results are different.

EXPLAIN SELECT ...;

By the way, I managed to reproduce the problem in phpmyadmin by adding a large number in the LIMIT (LIMIT 0,3000)… I don’t know how to tell to phpmyadmin not to add LIMIT at all, this is why I did it like that. When I do this phpmyadmin also takes over 6 seconds to run the query with “LIMIT 0,3000”. This is again consistent as long as I wait a few seconds between re-running the query (otherwise the result comes from the cache and it takes just 0.0003 sec).

Then adding “LIMIT 0,60” instead of “LIMIT 0,3000” makes the query run in less than 0.04 sec (down from 6+ sec)! This is again consident as long as I wait a few seconds before re-running the query in order to avoid hitting the cache.

Yes! It seems that there is difference between using “LIMIT 0,3000” and using “LIMIT 0,60”.

MySQL seems to do things in different order and the “Extra” column has a bit different things in some of the rows!

The words “full table scan” do not appear in your explain plan anywhere?

no. The only words that appear on the slower query (LIMIT 0,3000) and not on the faster one (LIMIT 0,60) are “Using temporary”.

So without the aggregation/distinctness how big would the result set be?

I assume the change in speed is because of the disk io from creating the temporary table before condensation. They can sometimes grow pretty massive, I was working on some Oracle the other week and the query design at that point was sometimes creating 120GB temp tables so were completely unrunnable(unless you like waiting for 120GB to be written and condensed before output). The table was indexed properly but the way the query was written just made it hard for the optimiser to pick the smallest path to follow before shrinking the result set down causing large io issues.

The limit here is probably helping the database stop forming a very large dataset it has to work on before aggregation etc. It is interesting stuff though as working with both databases and application front ends far more benefit can be gained from understanding the database than doing weird ass optimization stuff in the application. 30 second white screen timeouts can sometimes easily become 1.5 seconds by giving it the right nudge. It is also something that needs to be given more active discussion, 6 seconds to 1000 people is over 1 1/2 hours of possible stolen life force. What is avoidable is stolen, whatever the users perception and level of patience.

The slow query log and explain are very good friends :slight_smile:

Looking at your query( I can’t see it or you data etc ) what path do you think it is taking with the limit clause which is slightly different from an unlimited query. What effect does the left join have on the data when joined? How does it act as a rowset multiplier in the temporary table etc? Is it 1 to 1 or 1 to 10000… Does playing with the memory configurations in the conf file help? 3000 is not a lot of rows so 6 seconds could be deemed high. Step the limits by 200 and see what point the temporay table is created and see what effect it has, if you are really anal chart a possible perfomance curve as the mathematics of the query unfold.

This page lists the conf params in bold.

Possibly there are a few more but I haven’t really touched MySQL in years and it is the first thing I grabbed off google and skimmed so I have not double verified and memory may be shaky if they are still valid( article is quite old). As you seem genuinely interested in why this might be happening I thought I might give you some research material. Building things that scale more flatly allowing greater use of the current available database hardware is definateley a good but too ignored skillset in the developer community.

As I am a developer I leave a couple of final sentences just to annoy any database people to remove any perceived belief that I have any respect for the little furry databases.

“A database is just a glorified Excel Spreadsheet isn’t it?”

“Indexes? My l33t coding skillz removes the need for any to be put on”

:slight_smile:

Thanks pgb101!

I am not an expert in databases (far from it) so I don’t care to go very deep to find our why MySQL does things differently. For me it is enough to know that this little change had a successful outcome (making the script run faster) without the possibility of any adverse effects. At some point I though I would have to spend hours to re-form the query in a more efficient way … and I am not even sure if I would have managed since my skills in SQL are limited, so I am glad I found this quick and dirty solution :slight_smile:

The only thing I want to do is to try this when I upgrade to the latest version of MySQL in a couple of months (still running MySQL 4.1.22). Maybe in MySQL 5 the “LIMIT” trick will not be needed.