Why Scan All 38K Rows When I Limit 4?

I’ve been using EXPLAIN to better do my queries and spotted this:


EXPLAIN SELECT films.film_title, trailers.trailer_title 
FROM films,trailers 
WHERE trailers.film_id=films.film_id 
ORDER BY trailers.trailer_id DESC 
LIMIT 4

The result I get is 48K rows must get scanned in the ‘trailers’ table. Why can’t it just do the 4 and stop?

Thanks
Ryan

Are you missing an index on trailer_id?

Try writing


EXPLAIN SELECT films.film_title, trailers.trailer_title 
FROM films,trailers 
WHERE trailers.film_id=films.film_id 
ORDER BY trailers.trailer_id DESC 
[COLOR="Blue"]LIMIT 0, 4 [/COLOR]

Doing so means you are starting at record 0, and grabbing 4 records starting there.
This should pull records 0, 1, 2, and 3.

Because you have

ORDER BY trailers.trailer_id DESC

MySQL can’t know what the 4 highest numbers are without looking through all the numbers.

Yeah. Don’t you know how many iterations it has to go through to sort stuff? Definitely 48,000 worth if you have a lot of records! Good spot there Scallio

1 SIMPLE trailers index film_id PRIMARY 4 NULL 45858
1 SIMPLE films eq_ref PRIMARY PRIMARY 4 trailera_trailers.trailers.film_id 1

trailer_id is a Primary, which automatically makes it an index, correct?

So there is no way to limit the scan if I have a limit and order by the primary column desc?

I know 45K rows isn’t exactly horrid in terms of other explains I’ve seen, but this query is run about 500K times per day, so limiting the scan seems beneficial.

Cheers
Ryan

You can limit the scan with a subquery (as I posted in your other thread). Your best bet is to execute this query in two phases… the first does:

SELECT MAX(t.id) AS maxid FROM trailers AS t;

The second subqueries the trailers table and selects “WHERE id > {$maxid} - 10” to limit number of records hashed against your other table.


SELECT f.film_title, tr.trailer_title 
FROM films AS f
INNER JOIN (
    SELECT * FROM trailers WHERE id > {$maxid} - 10
AS tr ON tr.film_id = f.film_id 
ORDER BY tr.trailer_id DESC 
LIMIT 0, 4

10 should give you enough padding to not worry about deleted records or skipped ids.

Nice. I added it and now have:


SELECT trailers.trailer_title, films.film_title 
FROM films,trailers 
WHERE trailers.trailer_id > $maxtrailerid - 10 
AND trailers.film_id=films.film_id 
ORDER BY trailers.trailer_id DESC 
LIMIT 4

Doing an Explain, that limited the rows scanned to 64.

Ryan

Cheers.

yeah,

Thanks mucho. I’m trying to see if I can apply a similar solution to my other thread. But looks doubtful.

Cheers
Ryan

nice technique :slight_smile:

I had a similar problem after my web host upgraded from mysql 5.0.91 to 5.5.8. I had a subquery which ended in

ORDER BY primary_key DESC LIMIT 1

and mysql 5.5 did the full table scan. The solution was to explicitely define which index should be used with

USE INDEX(primary_key)

. I’ve heard mysql 5.5 has huge performance improvements but in this case its optimizer turns out to be really dumb!

You put it right after SELECT … FROM table. See here. As far as I know its been in mysql for quite some time, at least since 4.1.

How do you add USE to the query? Never done so. Restricted to a version?

Cheers
Ryan

I’m trying it, but getting an error:


EXPLAIN SELECT films.film_title, trailers.trailer_title
FROM films, trailers
USE INDEX (trailers.trailer_id)
WHERE trailers.film_id = films.film_id
ORDER BY trailers.trailer_id DESC
LIMIT 4

Me confused?

Ryan

You cannot use table. name in USE INDEX, only the index name (not column name!). Read the first comment on the documentation page on syntax for multiple table queries.

Ah.

Okay, got it to work. But same number rows scanned. :frowning:

I’m going to see if I can make this work with other queries hopefully.

Cheers
Ryan

Yeah, each query is unique so there’s no guarantee this method will work in your case. But it’s worth trying USE INDEX/FORCE INDEX with various syntaxes. You could try using INNER JOIN - then you could get rid of the WHERE clause - who knows, maybe the optimizer is misled by the WHERE?

Alternatively, you could fetch the ids of last 4 trailers in a separate query, the index should be guaranteed (I hope…!) to be used:


SELECT trailer_id
FROM trailers
ORDER BY trailer_id DESC
LIMIT 4 

If this is successful you can use IN() clause in the second query without the LIMIT. How optimal that is - I don’t know :slight_smile:

Yeah. I have considered cutting to two queries. The only big catch is I use ORDER BY a ton, and that seams to be the culprit in plenty of my cases.

Working on it.

Cheers
Ryan