Thanks. Your first query is how I would’ve done it but I am finding the sub-query is faster. My initial question I have answered myself in that repeating the sub-query, while more verbose, does not seem to affect performance.
I’m switching to a slightly different example though the set up is the same; I am using tables with more data in it.
journalItems has a 1-to-1 relation with pages, which just stores slug, meta title, h1, etc. The listings table stores 1+ categories for each item. There is a compound itemID + categoryID unique index so an item can’t have the same category twice.
SQL_NO_CACHE
is just for testing. SQL_CALC_FOUND_ROWS
is so that I can paginate results without having to do a second COUNT()
query.
Here it is with a sub-query. Average query speed: 0.008s.
SELECT SQL_NO_CACHE SQL_CALC_FOUND_ROWS `journalItems`.`page`, `pages`.`slug`, `journalItems`.`title`, `journalItems`.`html`, `journalItems`.`dateFiled`, `journalItems`.`image`, `journalItems`.`gallery`, `journalItems`.`documents`, `journalItems`.`links`, `journalItems`.`volume`, `journalItems`.`no`, `journalItems`.`volumeNo`, YEAR(`journalItems`.`dateFiled`) AS `year`,
(SELECT GROUP_CONCAT(`categoryID`) FROM `journalListings` WHERE `itemID` = `journalItems`.`id`) AS `categories`
FROM `journalItems`
JOIN `pages` ON `journalItems`.`page` = `pages`.`id`
WHERE (1 = 1) AND `pages`.`status` = 'published' AND `dateFiled` <= NOW()
HAVING FIND_IN_SET(6, (SELECT GROUP_CONCAT(`categoryID`) FROM `journalListings` WHERE `itemID` = `journalItems`.`id`))
ORDER BY `volumeNo` DESC, `journalItems`.`page` ASC
LIMIT 225, 25;
EXPLAIN gives:
1 PRIMARY journalItems ALL cubed_page NULL NULL NULL 423 Using where; Using filesort
1 PRIMARY pages eq_ref PRIMARY PRIMARY 4 database.journalItems.cubed_page 1 Using where
3 DEPENDENT SUBQUERY journalListings ref itemID itemID 4 database.journalItems.id 1 NULL
2 DEPENDENT SUBQUERY journalListings ref itemID itemID 4 database.journalItems.id 1 NULL
Here’s the other query. The difference is I have removed the sub-query and added a JOIN
and GROUP BY
. Average query speed: 0.06s.
SELECT SQL_NO_CACHE SQL_CALC_FOUND_ROWS `journalItems`.`page`, `pages`.`slug`, `journalItems`.`title`, `journalItems`.`html`, `journalItems`.`dateFiled`, `journalItems`.`image`, `journalItems`.`gallery`, `journalItems`.`documents`, `journalItems`.`links`, `journalItems`.`volume`, `journalItems`.`no`, `journalItems`.`volumeNo`, YEAR(`journalItems`.`dateFiled`) AS `year`,
GROUP_CONCAT(`journalListings`.`categoryID`) AS `categories`
FROM `journalItems`
JOIN `pages` ON `journalItems`.`page` = `pages`.`id`
JOIN `journalListings` ON `journalListings`.`itemID` = `journalItems`.`id`
WHERE (1 = 1) AND `pages`.`status` = 'published' AND `dateFiled` <= NOW()
GROUP BY `journalItems`.`page`, `pages`.`slug`, `journalItems`.`title`, `journalItems`.`html`, `journalItems`.`dateFiled`, `journalItems`.`image`, `journalItems`.`gallery`, `journalItems`.`documents`, `journalItems`.`links`, `journalItems`.`volume`, `journalItems`.`no`, `journalItems`.`volumeNo`
HAVING FIND_IN_SET(6, GROUP_CONCAT(`journalListings`.`categoryID`))
ORDER BY `volumeNo` DESC, `journalItems`.`page` ASC
LIMIT 225, 25;
EXPLAIN gives:
1 SIMPLE journalListings index itemID categoryID_2 8 NULL 611 Using index; Using temporary; Using filesort
1 SIMPLE journalItems eq_ref PRIMARY,cubed_page PRIMARY 4 database.journalListings.itemID 1 Using where
1 SIMPLE pages eq_ref PRIMARY PRIMARY 4 database.journalItems.cubed_page 1 Using where
I don’t understand why the big speed difference. The second query seems to be slowing down from the GROUP BY
. I will admit I am not very good at interpreting EXPLAIN
s so any help is appreciated. I am happy to just stick with the sub-query approach since it saves having to GROUP BY
all those columns. It would be good to know why there is such a difference in speed. I’m sure it will be something obvious that I have overlooked!
Thanks.