TABLE SCAN and LIMIT
I have a table with the following structure
user_id (int PRIMARY AUTO INCREMENT)
status (tinyint) - possible values 1, 2 and 3
INDEXES are PRIMARY(user_id) and status(status) and table has nearly 100000 records.
Now most of the queries that we run fetches user records on the basis of their status (1, 2 or 3)
E.g. SELECT * FROM users WHERE status='2' ORDER BY user_id LIMIT 10
Now records associated with every status (1, 2 or 3) are nearly one third of 100000.
Now since we have index on status and user_id, when I run this query with EXPLAIN, it shows me the following data
Now since I have mentioned LIMIT 10 with ORDER BY user_id, why is it scanning all the records with status='2'? When ORDER BY column i.e. user_id (PRIMARY) is already indexed, it should have sorted the first 10 rows and returned the data.
At the moment the SELECTs have become slow and I am afraid that it will get worse with more records.
So what should I do with INDEXES so that it only scans 10 rows or whatever I mention with LIMIT.
If I use USE INDEX (PRIMARY) in the query, it removes "Using filesort" however the number of rows that getting scanned increases to 100000.
Any suggestions on how to optimize this particular table?
What do you really want the ORDER BY clause to do?
You want it to sort all of your users, -and give you then 10 first right?
If so, - then it will need to sort all, nut just 10
try an index on (status,user_id), and drop the index on status alone
Thanks for your comments.
Originally Posted by r937
I did try the same query after putting an index on (status, user_id) and by dropping an index on status alone however the results are just the same.
It's scanning 32222 rows in order to display just 10.
Does it work like this only? I mean soon the number of records in my database would become 5 times of what they are now, so it means this query would be scanning 5 * 32222 records at that time. This would make the process very very slow.
If I remove the ORDER BY statement from the query, still the table is scanning all the records (32222) with status=’2’ in order to display just 10.
Originally Posted by dadane
Why is it so? Is there any way of optimizing this?
possible keys: status_user_id
i am not an optimization specialist but it looks to me like it did use the index
EXPLAIN does not "explain" LIMIT optimizations very well. so it's very likely that only 10 rows are ever examined but EXPLAIN will never say so. see http://www.mysqlperformanceblog.com/...-optimization/
Thank you guys for your comments. I think longneck is correct as EXPLAIN statement may be giving wrong information for rows scanned in this case.
Originally Posted by longneck
This is what they've mentioned in their article:
One more note about ORDER BY ... LIMIT is - it provides scary explain statements and may end up in slow query log as query which does not use indexes, even if it is quite fast: