Counting rows with WHERE and HAVING while using ONLY_FULL_GROUP_BY

I have this query that is trying to count the number of rows from a filtered search.

SELECT COUNT( `informationItems`.`id`) AS `num`,
       (SELECT GROUP_CONCAT(`categoryID`) FROM `informationListings` WHERE `itemID` = `informationItems`.`id`) AS `categories`
FROM `informationItems` 
JOIN `pages` ON `informationItems`.`page` = `pages`.`id` 
WHERE `pages`.`status` = 'published' AND `dateFiled` <= NOW()
HAVING FIND_IN_SET(1, `categories`);

I am trying to make it work under SQL mode ONLY_FULL_GROUP_BY . I get why it fails — because the results set is indeterminate. Is it possible to get this to work under ONLY_FULL_GROUP_BY mode while still using the sub-query (other than removing the COUNT() and then counting the actual number or rows returned)?

try this

SELECT pages.id , COUNT(*) AS num , GROUP_CONCAT(categoryID) AS categories FROM pages INNER JOIN informationItems ON informationItems.page = pages.id INNER JOIN informationListings ON informationListings.itemID = informationItems.id WHERE pages.status = 'published' AND pages.dateFiled <= NOW() GROUP BY pages.id HAVING COUNT(CASE WHEN informationItems.categoryID = 1 THEN 'gotcha' ELSE NULL END ) > 0

Thanks, I’ll give it a go.

What is your opinion on using ANY_VALUE() in 5.7? It looks as though MySQL is the only database that offers this so I am assuming it is frowned upon when it comes to SQL standards.

had not heard of it until now :slight_smile:

My bet is you go the “dependent on GROUP BY” route
https://dev.mysql.com/doc/refman/5.7/en/upgrading-from-previous-series.html

If it is possible to modify an offending query, do so, either so that nondeterministic nonaggregated columns are functionally dependent on GROUP BY columns, or by referring to nonaggregated columns using ANY_VALUE().

actually i won’t need to do anything, as i have been writing ANSI-compatible GROUP BY queries for decades

:trophy:

1 Like

Is this a lot more inefficient that @r937’s version?

SELECT count(*)
FROM (
    {queryGoesHere}
) AS `num`

Will this get a lot slower and use a lot more memory as the data increases or does the optimiser “know” that it is only fetching a number?

The reason I ask is with the code I am refactoring it is much more convenient to do the latter version so I am weighing up how much of it needs to change.

  1. no, not a lot

  2. my version has counts for each id

Great, thank you.

I really do owe you a beer!

I have tried a few different queries on a few different tables (even on a MyISAM table) and it seems to me that SELECT COUNT(*) FROM ({subQuery}) is slower than just the subQuery on its own. On my test system the first query is up to 2.5x slower. Even if I remove JOINS it is pretty much the same result.

While does counting in this way add so much overhead?

You may be able to spot the difference by using EXPLAIN

https://dev.mysql.com/doc/refman/5.7/en/explain.html

The EXPLAIN statement provides information about how MySQL executes statements

Thanks. That appears to be that sub-queries don’t use indexes. I still don’t understand while it is that much slower since there are only a few hundred records in there.

How would you change your query so that is gives a total count rather than for each ID? I am finding that counting when there is a HAVING clause really slows things down. I have only managed to get it to work with a sub-query. If you don’t use a sub-query it seems to ignore the HAVING.

could you give examples of both please

I will do.

But at the weekend I read about SQL_CALC_FOUND_ROWS and this seems to be perfect for what I want. I am doing pagination and am wanting the total count and — this does it without having to do the extra SELECT COUNT(*) FROM {subQuery} query.

A lot of people seem to be down on it and quote a decade-old Percona blog but according to the MySQL docs, and more recent anecdotal evidence on Stack Overflow, SQL_CALC_FOUND_ROWS would usually be faster than two queries, so long as you are using LIMIT.

In my case, adding SQL_CALC_FOUND_ROWS seems to make little-to-no difference to the query speed and is twice as fast as the SELECT COUNT(*) FROM {subQuery} query on its own!

What is your experience of this vs COUNT(*) on later versions (5.6+)?

never needed it, but would’ve used it in a new york minute over a separate count query

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.