ONLY_FULL_GROUP_BY and query efficency

I am trying to ensure I always use ONLY_FULL_GROUP_BY to make my queries better from now on.

I have this that doesn’t work using ONLY_FULL_GROUP_BY:

SELECT employers.id, employers.name,
(SELECT GROUP_CONCAT(categoryID) FROM employersListings WHERE itemID = employers.id) AS categories
FROM employers
GROUP BY employers.id, employers.name
HAVING FIND_IN_SET(2, (categories))

This does work using ONLY_FULL_GROUP_BY:

SELECT employers.id, employers.name,
(SELECT GROUP_CONCAT(categoryID) FROM employersListings WHERE itemID = employers.id) AS categories
FROM employers
GROUP BY employers.id, employers.name
HAVING FIND_IN_SET(2, (SELECT GROUP_CONCAT(categoryID) FROM employersListings WHERE itemID = employers.id))

Now, I know why the first version doesn’t work with ONLY_FULL_GROUP_BY. I’m sure it’s just a psychological niggle in my mind because the second version has repeated code but I just wanted to check: is the second version any less efficient for having two sub-queries? Or is it just that better SQL in this case is more verbose? I’m guessing in the first version MySQL is actually doing something very similar under the hood.

When you have the DRY principal in your head it just seems weird doing something like this if I was searching multiple categories:

HAVING FIND_IN_SET(1, (SELECT GROUP_CONCAT(categoryID) FROM employersListings WHERE itemID = employers.id))  
    OR FIND_IN_SET(2, (SELECT GROUP_CONCAT(categoryID) FROM employersListings WHERE itemID = employers.id)) 
    OR FIND_IN_SET(3, (SELECT GROUP_CONCAT(categoryID) FROM employersListings WHERE itemID = employers.id))

Does the optimiser just call the sub-query once per grouped record found?

Thank you.

try it like this

not sure why yours “didn’t work”

SELECT employers.id , employers.name , GROUP_CONCAT(categoryID) AS categories FROM employers INNER JOIN employersListings ON employersListings.itemID = employers.id GROUP BY employers.id , employers.name HAVING FIND_IN_SET(2, categories)

in any case i’m positive this will work…

SELECT * FROM ( SELECT employers.id , employers.name , GROUP_CONCAT(categoryID) AS categories FROM employers LEFT OUTER JOIN employersListings ON employersListings.itemID = employers.id GROUP BY employers.id , employers.name ) AS q WHERE FIND_IN_SET(2, categories)

1 Like

Thanks. Your first query is how I would’ve done it but I am finding the sub-query is faster. My initial question I have answered myself in that repeating the sub-query, while more verbose, does not seem to affect performance.

I’m switching to a slightly different example though the set up is the same; I am using tables with more data in it.

journalItems has a 1-to-1 relation with pages, which just stores slug, meta title, h1, etc. The listings table stores 1+ categories for each item. There is a compound itemID + categoryID unique index so an item can’t have the same category twice.

SQL_NO_CACHE is just for testing. SQL_CALC_FOUND_ROWS is so that I can paginate results without having to do a second COUNT() query.

Here it is with a sub-query. Average query speed: 0.008s.

SELECT SQL_NO_CACHE SQL_CALC_FOUND_ROWS `journalItems`.`page`, `pages`.`slug`, `journalItems`.`title`, `journalItems`.`html`, `journalItems`.`dateFiled`, `journalItems`.`image`, `journalItems`.`gallery`, `journalItems`.`documents`, `journalItems`.`links`, `journalItems`.`volume`, `journalItems`.`no`, `journalItems`.`volumeNo`, YEAR(`journalItems`.`dateFiled`) AS `year`, 
	(SELECT GROUP_CONCAT(`categoryID`) FROM `journalListings` WHERE `itemID` = `journalItems`.`id`) AS `categories` 
FROM `journalItems` 
JOIN `pages` ON `journalItems`.`page` = `pages`.`id` 
WHERE (1 = 1) AND `pages`.`status` = 'published' AND `dateFiled` <= NOW() 
HAVING FIND_IN_SET(6, (SELECT GROUP_CONCAT(`categoryID`) FROM `journalListings` WHERE `itemID` = `journalItems`.`id`)) 
ORDER BY `volumeNo` DESC, `journalItems`.`page` ASC 
LIMIT 225, 25;

EXPLAIN gives:

1	PRIMARY	journalItems	ALL	cubed_page	NULL	NULL	NULL	423	Using where; Using filesort
1	PRIMARY	pages	eq_ref	PRIMARY	PRIMARY	4	database.journalItems.cubed_page	1	Using where
3	DEPENDENT SUBQUERY	journalListings	ref	itemID	itemID	4	database.journalItems.id	1	NULL
2	DEPENDENT SUBQUERY	journalListings	ref	itemID	itemID	4	database.journalItems.id	1	NULL

Here’s the other query. The difference is I have removed the sub-query and added a JOIN and GROUP BY. Average query speed: 0.06s.

SELECT SQL_NO_CACHE SQL_CALC_FOUND_ROWS `journalItems`.`page`, `pages`.`slug`, `journalItems`.`title`, `journalItems`.`html`, `journalItems`.`dateFiled`, `journalItems`.`image`, `journalItems`.`gallery`, `journalItems`.`documents`, `journalItems`.`links`, `journalItems`.`volume`, `journalItems`.`no`, `journalItems`.`volumeNo`, YEAR(`journalItems`.`dateFiled`) AS `year`, 
	GROUP_CONCAT(`journalListings`.`categoryID`) AS `categories` 
FROM `journalItems` 
JOIN `pages` ON `journalItems`.`page` = `pages`.`id` 
JOIN `journalListings` ON `journalListings`.`itemID` = `journalItems`.`id` 
WHERE (1 = 1) AND `pages`.`status` = 'published' AND `dateFiled` <= NOW()
GROUP BY `journalItems`.`page`, `pages`.`slug`, `journalItems`.`title`, `journalItems`.`html`, `journalItems`.`dateFiled`, `journalItems`.`image`, `journalItems`.`gallery`, `journalItems`.`documents`, `journalItems`.`links`, `journalItems`.`volume`, `journalItems`.`no`, `journalItems`.`volumeNo`
HAVING FIND_IN_SET(6, GROUP_CONCAT(`journalListings`.`categoryID`)) 
ORDER BY `volumeNo` DESC, `journalItems`.`page` ASC 
LIMIT 225, 25;

EXPLAIN gives:

1	SIMPLE	journalListings	index	itemID	categoryID_2	8	NULL	611	Using index; Using temporary; Using filesort
1	SIMPLE	journalItems	eq_ref	PRIMARY,cubed_page	PRIMARY	4	database.journalListings.itemID	1	Using where
1	SIMPLE	pages	eq_ref	PRIMARY	PRIMARY	4	database.journalItems.cubed_page	1	Using where

I don’t understand why the big speed difference. The second query seems to be slowing down from the GROUP BY. I will admit I am not very good at interpreting EXPLAINs so any help is appreciated. I am happy to just stick with the sub-query approach since it saves having to GROUP BY all those columns. It would be good to know why there is such a difference in speed. I’m sure it will be something obvious that I have overlooked!

Thanks.

SELECT SQL_NO_CACHE SQL_CALC_FOUND_ROWS journalItems.page , pages.slug , journalItems.title , journalItems.html , journalItems.dateFiled , journalItems.image , journalItems.gallery , journalItems.documents , journalItems.links , journalItems.volume , journalItems.no , journalItems.volumeNo , YEAR(journalItems.dateFiled) AS year , qq.categories FROM journalItems INNER JOIN pages ON pages.id = journalItems.page AND pages.status = 'published' INNER JOIN ( SELECT itemID , GROUP_CONCAT(categoryID) AS categories FROM journalListings GROUP BY itemID ) AS qq ON qq.itemID = journalItems.id WHERE (1 = 1) AND journalItems.dateFiled <= NOW() AND FIND_IN_SET(6, categories) ORDER BY volumeNo DESC , journalItems.page ASC LIMIT 225, 25;

Thanks. Your new one is about twice as fast as my GROUP BY one.

I have found the bottleneck though — it’s the html column. This is a MEDIUMTEXT column. If you omit html from your most recent query and my second one they are both as fast as my first one, the sub-query one.

I presume when you GROUP BY a large text column there is some significant overhead but I don’t see why it would slow your one down so much.

I think I’ll stick with the sub query then.

Thanks for your help!

glad i could be of assistance :smiley:

1 Like

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.