Slow subquery mysql


#1

I have this join where a subquery is very slow. Can't figure out why

Explain SELECT `projects`.`id`, 
 (SELECT GROUP_CONCAT(DISTINCT `taxonomy`.`commonname` ORDER BY `taxonomy`.`commonname` SEPARATOR ', ') 
    FROM `project_taxonomy`, `taxonomy`
    WHERE `projects`.`id` = `project_taxonomy`.`project_id` AND `project_taxonomy`.`taxonomy_id` = `taxonomy`.`uniqueidentifier` ) AS 'Species',
    `projects`.`startdate`, 
`projects`.`enddate`,  
`projects`.`created_at`, 
`projects`.`updated_at` 
FROM `projects` 
INNER JOIN `institutions` ON `projects`.`institution_id` = `institutions`.`id` 
LEFT JOIN `institutions` as `m_institutions` ON `projects`.`managing_organisation_id` = `m_institutions`.`id` 
LEFT JOIN `projectcontributions` as `contributions` ON `projects`.`id` = `contributions`.`project_id` 
GROUP BY `projects`.`id`, `contributions`.`id` 
ORDER BY `projects`.`id` ASC, `contributions`.`year` ASC

The explain:

1	PRIMARY	projects	ALL	projects_institutions_id_foreign,institution_id	NULL	NULL	NULL	2278	Using temporary; Using filesort
1	PRIMARY	institutions	eq_ref	PRIMARY	PRIMARY	4	conservationdb4.projects.institution_id	1	Using index
1	PRIMARY	m_institutions	eq_ref	PRIMARY	PRIMARY	4	conservationdb4.projects.managing_organisation_id	1	Using index
1	PRIMARY	contributions	ref	projectcontributions_project_id_foreign	projectcontributions_project_id_foreign	4	conservationdb4.projects.id	3	NULL
2	DEPENDENT SUBQUERY	project_taxonomy	ref	project_taxonomy_project_id_taxonomy_id_genus_unique,taxonomy_id	project_taxonomy_project_id_taxonomy_id_genus_unique	4	func	1	Using index
2	DEPENDENT SUBQUERY	taxonomy	ALL	uniqueidentifier	NULL	NULL	NULL	41341	Range checked for each record (index map: 0x2)

This is the table structure for taxonomy:

CREATE TABLE `taxonomy` (
  `id` int(10) UNSIGNED NOT NULL,
  `class` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `order` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `family` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `genus` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `scientificname` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `commonname` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `iucn` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `cites` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `range` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `uniqueidentifier` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `asofdate` varchar(255) COLLATE utf8_unicode_ci NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

--
-- Indexes for dumped tables
--

--
-- Indexes for table `taxonomy`
--
ALTER TABLE `taxonomy`
  ADD PRIMARY KEY (`id`),
  ADD UNIQUE KEY `uniqueidentifier` (`uniqueidentifier`),
  ADD KEY `scientificname` (`scientificname`),
  ADD KEY `commonname` (`commonname`);

So even though I have an unique key on uniqueidentifier, when running the subquery it still wants to look at each record.


#2

Ok the problem seems to be that the uniqueidentifier varchar(255) in the taxonomy table is of a different type then the taxonomy_id int(10) in the project_taxonomy table!


#3

i just thought i would point out that your query can be significantly improved by removing

  • the INNER JOIN to institutions
  • the LEFT JOIN to institutions as m_institutions
  • the LEFT JOIN to projectcontributions

you are not using any of the columns in those tables!!!

(i guess if you wanted to make sure that every project returned actually has an institution, you can keep the INNER JOIN)


#4

also, in case you're interested, you might try rewriting your SELECT clause correlated subquery as a FROM clause derived table

SELECT projects.startdate
     , projects.enddate
     , projects.created_at
     , projects.updated_at 
     , subquery.species
  FROM projects 
LEFT OUTER
  JOIN ( SELECT project_taxonomy.project_id
              , GROUP_CONCAT(DISTINCT taxonomy.commonname 
                             ORDER BY taxonomy.commonname 
                             SEPARATOR ', ') AS species 
           FROM project_taxonomy
         INNER
           JOIN taxonomy
             ON taxonomy.uniqueidentifier = project_taxonomy.taxonomy_id
         GROUP
             BY project_taxonomy.project_id    
       ) AS subquery
    ON subquery.project_id = projects.id

#5

Thanks for your replies!

I do use these:
- The INNER JOIN to institutions
- the LEFT JOIN to institutions as m_institutions
- the LEFT JOIN to projectcontributions
but left some of the fields out of the original post to make it more readable. The complete query is:

SELECT 
    `projects`.`id`, 
    `projects`.`title`, 
    `m_institutions`.`name` as 'managinginstitutionname', 
    `institutions`.`name`, 
    `contributions`.`year`, 
    `c_institution`.`name` as 'contributername', 
    `institutionyears`.`visitors` AS 'visitor_number', 
    `institutionyears`.`budget` AS 'operational_budget', 
    CASE `contributions`.`supporttype` 
        WHEN 1 THEN 'Financial' 
        WHEN 2 THEN 'Manhours' 
        END as 'supporttype' , 
    CASE `contributions`.`fromoperationalbudget` 
        WHEN 0 THEN 'no' 
        WHEN 1 THEN 'yes' 
        ELSE 'unknown' END as 'fromoperationalbudget', 
    `contributions`.`amount`, 
    `contributions`.`currency`,   
        `contributions`.`manhours`,
    `contributions`.`details`, 
    `projects`.`description`, 
(SELECT GROUP_CONCAT(DISTINCT `categories`.`category` ORDER BY `categories`.`category` SEPARATOR ', ') 
    FROM `project_categories`,`categories`
    WHERE `projects`.`id` = `project_categories`.`project_id` AND `project_categories`.`category_id` = `categories`.`id`) AS 'Eaza_conservation_categories' ,
(SELECT GROUP_CONCAT(DISTINCT `p_institutions`.`name` ORDER BY `p_institutions`.`name` SEPARATOR ', ' ) 
    FROM `project_institutions`, `institutions` as `p_institutions`
    WHERE `projects`.`id` = `project_institutions`.`project_id` AND `p_institutions`.`id` = `project_institutions`.`institution_id` ) AS 'Participating_institutions',
(SELECT GROUP_CONCAT(DISTINCT `taxonomy`.`commonname` ORDER BY `taxonomy`.`commonname` SEPARATOR ', ') 
    FROM `project_taxonomy`, `taxonomy`
    WHERE `projects`.`id` = `project_taxonomy`.`project_id` AND `project_taxonomy`.`taxonomy_id` = `taxonomy`.`id` ) AS 'Species',
(SELECT GROUP_CONCAT(DISTINCT `project_genus`.`genus` ORDER BY `project_genus`.`genus` SEPARATOR ', ') 
    FROM `project_genus`
    WHERE `projects`.`id` = `project_genus`.`project_id`) AS 'Genus',
(SELECT GROUP_CONCAT(DISTINCT `tags`.`tag` ORDER BY `tags`.`tag` SEPARATOR ', ') 
    FROM `project_tags`, `tags`
    WHERE `projects`.`id` = `project_tags`.`project_id` AND `project_tags`.`tag_id` = `tags`.`id`) AS 'TAGs',
(SELECT GROUP_CONCAT(DISTINCT `animalgroups`.`animalgroup` ORDER BY `animalgroups`.`animalgroup` SEPARATOR ', ') 
    FROM `project_animalgroups`, `animalgroups`
    WHERE `projects`.`id` = `project_animalgroups`.`project_id` AND `project_animalgroups`.`animalgroup_id` = `animalgroups`.`id` ) AS 'Animalgroups',
(SELECT GROUP_CONCAT(DISTINCT `habitats`.`habitat` ORDER BY `habitats`.`habitat` SEPARATOR ', ') 
    FROM `project_habitats`, `habitats`
    WHERE `projects`.`id` = `project_habitats`.`project_id` AND `project_habitats`.`habitat_id` = `habitats`.`id` ) AS 'Habitats',
(SELECT GROUP_CONCAT(DISTINCT `countries`.`country` ORDER BY `countries`.`country` SEPARATOR ', ') 
    FROM `project_countries`, `countries`
    WHERE `projects`.`id` = `project_countries`.`project_id` AND `project_countries`.`country` = `countries`.`code`) AS 'Countries',
(SELECT GROUP_CONCAT(DISTINCT `locations`.`location` ORDER BY `locations`.`location` SEPARATOR ', ') 
    FROM `project_locations`, `locations`
    WHERE `projects`.`id` = `project_locations`.`project_id` AND `project_locations`.`location_id` = `locations`.`id` ) AS 'Locations',
(SELECT GROUP_CONCAT(DISTINCT threats.`threat` ORDER BY threats.`threat` SEPARATOR ', ') 
    FROM `project_threats`, `threats`
    WHERE `projects`.`id` = `project_threats`.`project_id` AND `project_threats`.`threat_id` = `threats`.`id`  ) AS 'Threats',
`projects`.`startdate`, 
`projects`.`enddate`,  
`projects`.`created_at`, 
`projects`.`updated_at` 
FROM `projects` 
INNER JOIN `institutions` ON `projects`.`institution_id` = `institutions`.`id` 
LEFT JOIN `institutions` as `m_institutions` ON `projects`.`managing_organisation_id` = `m_institutions`.`id` 
LEFT JOIN `projectcontributions` as `contributions` ON `projects`.`id` = `contributions`.`project_id` 
LEFT JOIN `institutions` as `c_institution` ON `c_institution`.`id` = `contributions`.`institution_id` 
LEFT JOIN `institutionyears` ON `institutionyears`.`institution_id` = `c_institution`.`id` 
LEFT JOIN `project_institutions` ON `projects`.`id` = `project_institutions`.`project_id` 
LEFT JOIN `institutions` as `p_institutions` ON `p_institutions`.`id` = `project_institutions`.`institution_id` 
GROUP BY `projects`.`id`, `contributions`.`id` 
ORDER BY `projects`.`id` ASC, `contributions`.`year` ASC

About your second reply: I'll try that and see how it imporves the query. Was not familiar with that syntax


#6

another key point is, you do not need GROUP BY in the outer query

that'll speed things up too


#7

oh, wait, i see the problem now...

you have all those joins in the FROM clause that are cross joining with each other

that's why you thought you needed the GROUP BY, to suppress the multiples and multiples of rows returned

meanwhile each of your correlated subqueries in the SELECT clause is accessing its own copy of those tables


#8

just another example where attempting to simplify the problem, ended up wasting time and hiding the actual problem

:slight_smile:


#9

Hi r937, thanks for your replies. Unfortunately, I'm afraid I don't understand what you're implying.

If I have this query:

SELECT 
    `projects`.`id`, 
    `projects`.`title`, 
    `m_institutions`.`name` as 'managinginstitutionname', 
    `institutions`.`name`, 
    `contributions`.`year`, 
    `c_institution`.`name` as 'contributername', 
    `institutionyears`.`visitors` AS 'visitor_number', 
    `institutionyears`.`budget` AS 'operational_budget', 
    CASE `contributions`.`supporttype` 
        WHEN 1 THEN 'Financial' 
        WHEN 2 THEN 'Manhours' 
        END as 'supporttype' , 
    CASE `contributions`.`fromoperationalbudget` 
        WHEN 0 THEN 'no' 
        WHEN 1 THEN 'yes' 
        ELSE 'unknown' END as 'fromoperationalbudget', 
    `contributions`.`amount`, 
    `contributions`.`currency`,   
        `contributions`.`manhours`,
    `contributions`.`details`, 
    `projects`.`description`, 
(SELECT GROUP_CONCAT(DISTINCT `categories`.`category` ORDER BY `categories`.`category` SEPARATOR ', ') 
    FROM `project_categories`,`categories`
    WHERE `projects`.`id` = `project_categories`.`project_id` AND `project_categories`.`category_id` = `categories`.`id`) AS 'Eaza_conservation_categories' ,
(SELECT GROUP_CONCAT(DISTINCT `p_institutions`.`name` ORDER BY `p_institutions`.`name` SEPARATOR ', ' ) 
    FROM `project_institutions`, `institutions` as `p_institutions`
    WHERE `projects`.`id` = `project_institutions`.`project_id` AND `p_institutions`.`id` = `project_institutions`.`institution_id` ) AS 'Participating_institutions',
(SELECT GROUP_CONCAT(DISTINCT `taxonomy`.`commonname` ORDER BY `taxonomy`.`commonname` SEPARATOR ', ') 
    FROM `project_taxonomy`, `taxonomy`
    WHERE `projects`.`id` = `project_taxonomy`.`project_id` AND `project_taxonomy`.`taxonomy_id` = `taxonomy`.`id` ) AS 'Species',
(SELECT GROUP_CONCAT(DISTINCT `tags`.`tag` ORDER BY `tags`.`tag` SEPARATOR ', ') 
    FROM `project_tags`, `tags`
    WHERE `projects`.`id` = `project_tags`.`project_id` AND `project_tags`.`tag_id` = `tags`.`id`) AS 'TAGs',
(SELECT GROUP_CONCAT(DISTINCT `animalgroups`.`animalgroup` ORDER BY `animalgroups`.`animalgroup` SEPARATOR ', ') 
    FROM `project_animalgroups`, `animalgroups`
    WHERE `projects`.`id` = `project_animalgroups`.`project_id` AND `project_animalgroups`.`animalgroup_id` = `animalgroups`.`id` ) AS 'Animalgroups',
(SELECT GROUP_CONCAT(DISTINCT `habitats`.`habitat` ORDER BY `habitats`.`habitat` SEPARATOR ', ') 
    FROM `project_habitats`, `habitats`
    WHERE `projects`.`id` = `project_habitats`.`project_id` AND `project_habitats`.`habitat_id` = `habitats`.`id` ) AS 'Habitats',
(SELECT GROUP_CONCAT(DISTINCT `countries`.`country` ORDER BY `countries`.`country` SEPARATOR ', ') 
    FROM `project_countries`, `countries`
    WHERE `projects`.`id` = `project_countries`.`project_id` AND `project_countries`.`country` = `countries`.`code`) AS 'Countries',
(SELECT GROUP_CONCAT(DISTINCT `locations`.`location` ORDER BY `locations`.`location` SEPARATOR ', ') 
    FROM `project_locations`, `locations`
    WHERE `projects`.`id` = `project_locations`.`project_id` AND `project_locations`.`location_id` = `locations`.`id` ) AS 'Locations',
(SELECT GROUP_CONCAT(DISTINCT threats.`threat` ORDER BY threats.`threat` SEPARATOR ', ') 
    FROM `project_threats`, `threats`
    WHERE `projects`.`id` = `project_threats`.`project_id` AND `project_threats`.`threat_id` = `threats`.`id`  ) AS 'Threats',
`projects`.`startdate`, 
`projects`.`enddate`,  
`projects`.`created_at`, 
`projects`.`updated_at` 
FROM `projects` 
INNER JOIN `institutions` ON `projects`.`institution_id` = `institutions`.`id` 
LEFT JOIN `institutions` as `m_institutions` ON `projects`.`managing_organisation_id` = `m_institutions`.`id` 
LEFT JOIN `projectcontributions` as `contributions` ON `projects`.`id` = `contributions`.`project_id` 
LEFT JOIN `institutions` as `c_institution` ON `c_institution`.`id` = `contributions`.`institution_id` 
LEFT JOIN `institutionyears` ON `institutionyears`.`institution_id` = `c_institution`.`id` 
LEFT JOIN `project_institutions` ON `projects`.`id` = `project_institutions`.`project_id` 
LEFT JOIN `institutions` as `p_institutions` ON `p_institutions`.`id` = `project_institutions`.`institution_id` 
GROUP BY `projects`.`id`, `contributions`.`id` 
ORDER BY `projects`.`id` ASC, `contributions`.`year` ASC

I can not leave out the GROUP BY, or else I'll get 13.000 rows returned instead of 8000.

What am I doing wrong? What is the actual problem?


#10

I also tested your example query:

SELECT 
    `projects`.`id`, 
    `projects`.`title`, 
    `m_institutions`.`name` as 'managinginstitutionname', 
    `institutions`.`name`, 
    `contributions`.`year`, 
    `c_institution`.`name` as 'contributername', 
    `institutionyears`.`visitors` AS 'visitor_number', 
    `institutionyears`.`budget` AS 'operational_budget', 
    CASE `contributions`.`supporttype` 
        WHEN 1 THEN 'Financial' 
        WHEN 2 THEN 'Manhours' 
        END as 'supporttype' , 
    CASE `contributions`.`fromoperationalbudget` 
        WHEN 0 THEN 'no' 
        WHEN 1 THEN 'yes' 
        ELSE 'unknown' END as 'fromoperationalbudget', 
    `contributions`.`amount`, 
    `contributions`.`currency`,   
        `contributions`.`manhours`,
    `contributions`.`details`, 
    `projects`.`description`, 
(SELECT GROUP_CONCAT(DISTINCT `categories`.`category` ORDER BY `categories`.`category` SEPARATOR ', ') 
    FROM `project_categories`,`categories`
    WHERE `projects`.`id` = `project_categories`.`project_id` AND `project_categories`.`category_id` = `categories`.`id`) AS 'Eaza_conservation_categories' ,
(SELECT GROUP_CONCAT(DISTINCT `p_institutions`.`name` ORDER BY `p_institutions`.`name` SEPARATOR ', ' ) 
    FROM `project_institutions`, `institutions` as `p_institutions`
    WHERE `projects`.`id` = `project_institutions`.`project_id` AND `p_institutions`.`id` = `project_institutions`.`institution_id` ) AS 'Participating_institutions',
`subquery`.`species`,
(SELECT GROUP_CONCAT(DISTINCT `tags`.`tag` ORDER BY `tags`.`tag` SEPARATOR ', ') 
    FROM `project_tags`, `tags`
    WHERE `projects`.`id` = `project_tags`.`project_id` AND `project_tags`.`tag_id` = `tags`.`id`) AS 'TAGs',
(SELECT GROUP_CONCAT(DISTINCT `animalgroups`.`animalgroup` ORDER BY `animalgroups`.`animalgroup` SEPARATOR ', ') 
    FROM `project_animalgroups`, `animalgroups`
    WHERE `projects`.`id` = `project_animalgroups`.`project_id` AND `project_animalgroups`.`animalgroup_id` = `animalgroups`.`id` ) AS 'Animalgroups',
(SELECT GROUP_CONCAT(DISTINCT `habitats`.`habitat` ORDER BY `habitats`.`habitat` SEPARATOR ', ') 
    FROM `project_habitats`, `habitats`
    WHERE `projects`.`id` = `project_habitats`.`project_id` AND `project_habitats`.`habitat_id` = `habitats`.`id` ) AS 'Habitats',
(SELECT GROUP_CONCAT(DISTINCT `countries`.`country` ORDER BY `countries`.`country` SEPARATOR ', ') 
    FROM `project_countries`, `countries`
    WHERE `projects`.`id` = `project_countries`.`project_id` AND `project_countries`.`country` = `countries`.`code`) AS 'Countries',
(SELECT GROUP_CONCAT(DISTINCT `locations`.`location` ORDER BY `locations`.`location` SEPARATOR ', ') 
    FROM `project_locations`, `locations`
    WHERE `projects`.`id` = `project_locations`.`project_id` AND `project_locations`.`location_id` = `locations`.`id` ) AS 'Locations',
(SELECT GROUP_CONCAT(DISTINCT threats.`threat` ORDER BY threats.`threat` SEPARATOR ', ') 
    FROM `project_threats`, `threats`
    WHERE `projects`.`id` = `project_threats`.`project_id` AND `project_threats`.`threat_id` = `threats`.`id`  ) AS 'Threats',
`projects`.`startdate`, 
`projects`.`enddate`,  
`projects`.`created_at`, 
`projects`.`updated_at` 
FROM `projects` 
INNER JOIN `institutions` ON `projects`.`institution_id` = `institutions`.`id` 
LEFT JOIN `institutions` as `m_institutions` ON `projects`.`managing_organisation_id` = `m_institutions`.`id` 
LEFT JOIN `projectcontributions` as `contributions` ON `projects`.`id` = `contributions`.`project_id` 
LEFT JOIN `institutions` as `c_institution` ON `c_institution`.`id` = `contributions`.`institution_id` 
LEFT JOIN `institutionyears` ON `institutionyears`.`institution_id` = `c_institution`.`id` 
LEFT JOIN `project_institutions` ON `projects`.`id` = `project_institutions`.`project_id` 
LEFT JOIN `institutions` as `p_institutions` ON `p_institutions`.`id` = `project_institutions`.`institution_id` 
LEFT OUTER
  JOIN ( SELECT `project_taxonomy`.`project_id`
              , GROUP_CONCAT(DISTINCT `taxonomy`.`commonname`
                             ORDER BY `taxonomy`.`commonname` 
                             SEPARATOR ', ') AS species 
           FROM `project_taxonomy`
         INNER
           JOIN `taxonomy`
             ON `taxonomy`.`id` = `project_taxonomy`.`taxonomy_id`
         GROUP
             BY `project_taxonomy`.`project_id`    
       ) AS subquery
    ON `subquery`.`project_id` = `projects`.`id`

GROUP BY `projects`.`id`, `contributions`.`id` 
ORDER BY `projects`.`id` ASC, `contributions`.`year` ASC

Which gives me the same amount of rows returned (8217) in 2,09 sec. Compared to

SELECT 
    `projects`.`id`, 
    `projects`.`title`, 
    `m_institutions`.`name` as 'managinginstitutionname', 
    `institutions`.`name`, 
    `contributions`.`year`, 
    `c_institution`.`name` as 'contributername', 
    `institutionyears`.`visitors` AS 'visitor_number', 
    `institutionyears`.`budget` AS 'operational_budget', 
    CASE `contributions`.`supporttype` 
        WHEN 1 THEN 'Financial' 
        WHEN 2 THEN 'Manhours' 
        END as 'supporttype' , 
    CASE `contributions`.`fromoperationalbudget` 
        WHEN 0 THEN 'no' 
        WHEN 1 THEN 'yes' 
        ELSE 'unknown' END as 'fromoperationalbudget', 
    `contributions`.`amount`, 
    `contributions`.`currency`,   
        `contributions`.`manhours`,
    `contributions`.`details`, 
    `projects`.`description`, 
(SELECT GROUP_CONCAT(DISTINCT `categories`.`category` ORDER BY `categories`.`category` SEPARATOR ', ') 
    FROM `project_categories`,`categories`
    WHERE `projects`.`id` = `project_categories`.`project_id` AND `project_categories`.`category_id` = `categories`.`id`) AS 'Eaza_conservation_categories' ,
(SELECT GROUP_CONCAT(DISTINCT `p_institutions`.`name` ORDER BY `p_institutions`.`name` SEPARATOR ', ' ) 
    FROM `project_institutions`, `institutions` as `p_institutions`
    WHERE `projects`.`id` = `project_institutions`.`project_id` AND `p_institutions`.`id` = `project_institutions`.`institution_id` ) AS 'Participating_institutions',
(SELECT GROUP_CONCAT(DISTINCT `taxonomy`.`commonname` ORDER BY `taxonomy`.`commonname` SEPARATOR ', ') 
    FROM `project_taxonomy`, `taxonomy`
    WHERE `projects`.`id` = `project_taxonomy`.`project_id` AND `project_taxonomy`.`taxonomy_id` = `taxonomy`.`id` ) AS 'Species',
(SELECT GROUP_CONCAT(DISTINCT `tags`.`tag` ORDER BY `tags`.`tag` SEPARATOR ', ') 
    FROM `project_tags`, `tags`
    WHERE `projects`.`id` = `project_tags`.`project_id` AND `project_tags`.`tag_id` = `tags`.`id`) AS 'TAGs',
(SELECT GROUP_CONCAT(DISTINCT `animalgroups`.`animalgroup` ORDER BY `animalgroups`.`animalgroup` SEPARATOR ', ') 
    FROM `project_animalgroups`, `animalgroups`
    WHERE `projects`.`id` = `project_animalgroups`.`project_id` AND `project_animalgroups`.`animalgroup_id` = `animalgroups`.`id` ) AS 'Animalgroups',
(SELECT GROUP_CONCAT(DISTINCT `habitats`.`habitat` ORDER BY `habitats`.`habitat` SEPARATOR ', ') 
    FROM `project_habitats`, `habitats`
    WHERE `projects`.`id` = `project_habitats`.`project_id` AND `project_habitats`.`habitat_id` = `habitats`.`id` ) AS 'Habitats',
(SELECT GROUP_CONCAT(DISTINCT `countries`.`country` ORDER BY `countries`.`country` SEPARATOR ', ') 
    FROM `project_countries`, `countries`
    WHERE `projects`.`id` = `project_countries`.`project_id` AND `project_countries`.`country` = `countries`.`code`) AS 'Countries',
(SELECT GROUP_CONCAT(DISTINCT `locations`.`location` ORDER BY `locations`.`location` SEPARATOR ', ') 
    FROM `project_locations`, `locations`
    WHERE `projects`.`id` = `project_locations`.`project_id` AND `project_locations`.`location_id` = `locations`.`id` ) AS 'Locations',
(SELECT GROUP_CONCAT(DISTINCT threats.`threat` ORDER BY threats.`threat` SEPARATOR ', ') 
    FROM `project_threats`, `threats`
    WHERE `projects`.`id` = `project_threats`.`project_id` AND `project_threats`.`threat_id` = `threats`.`id`  ) AS 'Threats',
`projects`.`startdate`, 
`projects`.`enddate`,  
`projects`.`created_at`, 
`projects`.`updated_at` 
FROM `projects` 
INNER JOIN `institutions` ON `projects`.`institution_id` = `institutions`.`id` 
LEFT JOIN `institutions` as `m_institutions` ON `projects`.`managing_organisation_id` = `m_institutions`.`id` 
LEFT JOIN `projectcontributions` as `contributions` ON `projects`.`id` = `contributions`.`project_id` 
LEFT JOIN `institutions` as `c_institution` ON `c_institution`.`id` = `contributions`.`institution_id` 
LEFT JOIN `institutionyears` ON `institutionyears`.`institution_id` = `c_institution`.`id` 
LEFT JOIN `project_institutions` ON `projects`.`id` = `project_institutions`.`project_id` 
LEFT JOIN `institutions` as `p_institutions` ON `p_institutions`.`id` = `project_institutions`.`institution_id` 
GROUP BY `projects`.`id`, `contributions`.`id` 
ORDER BY `projects`.`id` ASC, `contributions`.`year` ASC

Which also returns 8217 rows, in 2,09 sec


#11

do these two things --

  1. remove all the joins from the FROM clause, i.e. leave just FROM projects

  2. take out the GROUP BY clause


#12

cross join effects -- all attributes for each project are cross-joined

research this later


#13

But if I do that my query doesn't run. I'll get errors like "Unknown column 'm_institutions.name' in 'field list'". So in order to select the field 'm_institutions.name' I'll need to join

LEFT JOIN institutions as m_institutions ON projects.managing_organisation_id=m_institutions.id

and for the field institutions.name, I'll need the join

INNER JOIN institutions ON projects.institution_id=institutions.id

etc.


#14

do me a favour, please

(i shouldn't be the one doing all the work here)

which tables are you accessing to show actual data for a project? e.g. contributions... and for each of these, is it possible to have multiple rows per project

secondly, which tables have just the "group concat" relationship that you want to list the attributes of? e.g. countries


#15

Sorry, not trying to let you do all the work. I'll try to note in the query below why I need everything:

SELECT 
    `projects`.`id`, 
    `projects`.`title`, 
-- 1. join on institutions to get the managing institution
    `m_institutions`.`name` as 'managinginstitutionname', 
-- 2. join institution to get the institution who created the project
    `institutions`.`name`, 
-- 3. get the contributions being made to each project
    `contributions`.`year`, 
-- 4. and the institution who made the contribution
    `c_institution`.`name` as 'contributername', 
-- 5. and some more info about the contributing institution
    `institutionyears`.`visitors` AS 'visitor_number', 
    `institutionyears`.`budget` AS 'operational_budget', 
-- 3b. more info on  the contributions being made to each project
    CASE `contributions`.`supporttype` 
        WHEN 1 THEN 'Financial' 
        WHEN 2 THEN 'Manhours' 
        END as 'supporttype' , 
    CASE `contributions`.`fromoperationalbudget` 
        WHEN 0 THEN 'no' 
        WHEN 1 THEN 'yes' 
        ELSE 'unknown' END as 'fromoperationalbudget', 
    `contributions`.`amount`, 
    `contributions`.`currency`,   
        `contributions`.`manhours`,
    `contributions`.`details`, 
    `projects`.`description`, 
-- All the categories for each project 
(SELECT GROUP_CONCAT(DISTINCT `categories`.`category` ORDER BY `categories`.`category` SEPARATOR ', ') 
    FROM `project_categories`,`categories`
    WHERE `projects`.`id` = `project_categories`.`project_id` AND `project_categories`.`category_id` = `categories`.`id`) AS 'Eaza_conservation_categories' ,
-- All the participating institutions for each project 
(SELECT GROUP_CONCAT(DISTINCT `p_institutions`.`name` ORDER BY `p_institutions`.`name` SEPARATOR ', ' ) 
    FROM `project_institutions`, `institutions` as `p_institutions`
    WHERE `projects`.`id` = `project_institutions`.`project_id` AND `p_institutions`.`id` = `project_institutions`.`institution_id` ) AS 'Participating_institutions',
-- All the species for each project 
(SELECT GROUP_CONCAT(DISTINCT `taxonomy`.`commonname` ORDER BY `taxonomy`.`commonname` SEPARATOR ', ') 
    FROM `project_taxonomy`, `taxonomy`
    WHERE `projects`.`id` = `project_taxonomy`.`project_id` AND `project_taxonomy`.`taxonomy_id` = `taxonomy`.`id` ) AS 'Species',
-- All the tags for each project 
(SELECT GROUP_CONCAT(DISTINCT `tags`.`tag` ORDER BY `tags`.`tag` SEPARATOR ', ') 
    FROM `project_tags`, `tags`
    WHERE `projects`.`id` = `project_tags`.`project_id` AND `project_tags`.`tag_id` = `tags`.`id`) AS 'TAGs',
(SELECT GROUP_CONCAT(DISTINCT `animalgroups`.`animalgroup` ORDER BY `animalgroups`.`animalgroup` SEPARATOR ', ') 
    FROM `project_animalgroups`, `animalgroups`
    WHERE `projects`.`id` = `project_animalgroups`.`project_id` AND `project_animalgroups`.`animalgroup_id` = `animalgroups`.`id` ) AS 'Animalgroups',
-- All the animalgroups for each project 
(SELECT GROUP_CONCAT(DISTINCT `habitats`.`habitat` ORDER BY `habitats`.`habitat` SEPARATOR ', ') 
    FROM `project_habitats`, `habitats`
    WHERE `projects`.`id` = `project_habitats`.`project_id` AND `project_habitats`.`habitat_id` = `habitats`.`id` ) AS 'Habitats',
-- All the habitats for each project 
(SELECT GROUP_CONCAT(DISTINCT `countries`.`country` ORDER BY `countries`.`country` SEPARATOR ', ') 
    FROM `project_countries`, `countries`
    WHERE `projects`.`id` = `project_countries`.`project_id` AND `project_countries`.`country` = `countries`.`code`) AS 'Countries',
-- All the countries for each project 
(SELECT GROUP_CONCAT(DISTINCT `locations`.`location` ORDER BY `locations`.`location` SEPARATOR ', ') 
    FROM `project_locations`, `locations`
    WHERE `projects`.`id` = `project_locations`.`project_id` AND `project_locations`.`location_id` = `locations`.`id` ) AS 'Locations',
-- All the locations for each project 
(SELECT GROUP_CONCAT(DISTINCT threats.`threat` ORDER BY threats.`threat` SEPARATOR ', ') 
    FROM `project_threats`, `threats`
    WHERE `projects`.`id` = `project_threats`.`project_id` AND `project_threats`.`threat_id` = `threats`.`id`  ) AS 'Threats',
-- All the threats for each project 
`projects`.`startdate`, 
`projects`.`enddate`,  
`projects`.`created_at`, 
`projects`.`updated_at` 
FROM `projects` 
-- 2. join institution to get the institution who created the project
INNER JOIN `institutions` ON `projects`.`institution_id` = `institutions`.`id` 
-- 1. join on institutions to get the managing institution
LEFT JOIN `institutions` as `m_institutions` ON `projects`.`managing_organisation_id` = `m_institutions`.`id` 
-- 3. get the contributions being made to each project
LEFT JOIN `projectcontributions` as `contributions` ON `projects`.`id` = `contributions`.`project_id` 
-- 4. and the institution who made the contribution
LEFT JOIN `institutions` as `c_institution` ON `c_institution`.`id` = `contributions`.`institution_id` 
-- 5. and some more info about the contributing institution
LEFT JOIN `institutionyears` ON `institutionyears`.`institution_id` = `c_institution`.`id` 
-- 6. OK this JOIN  one I'm not sure about if I need it
LEFT JOIN `project_institutions` ON `projects`.`id` = `project_institutions`.`project_id` 
LEFT JOIN `institutions` as `p_institutions` ON `p_institutions`.`id` = `project_institutions`.`institution_id` 

GROUP BY `projects`.`id`, `contributions`.`id` 
ORDER BY `projects`.`id` ASC, `contributions`.`year` ASC

So it seems to me (but I'm not the db expert) that I need almost all of the joins. Not sure about the last ones:

LEFT JOIN `project_institutions` ON `projects`.`id` = `project_institutions`.`project_id` 
LEFT JOIN `institutions` as `p_institutions` ON `p_institutions`.`id` = `project_institutions`.`institution_id`

Maybe these are a leftover from a previous iteration of the query.


#16

all the tables that you are referencing in a GROUP_CONCAT subquery, should not be in the FROM clause

tables that do belong in the FROM clause are ones where you're showing some sort of single attribute

contributions concerns me... how many contributions can a single project have?

what this comes down to is distinguishing between relationships where a single project can have at most one related row, and those relationships where a single project can have multiple related rows

if you understand this distinction, please reply by identifying those tables where there is at most one related row per project


#17

OK I'll try:

-- Each project can only have one managing institution
    `m_institutions`.`name` as 'managinginstitutionname',
-- Each project can only have one institution which created the project
    `institutions`.`name`,

all others are one to many or many to many relations.

So in the table projects, are 2 foreign keys managing_organisation_id and institution_id pointing to the institutions table. Those are one to many.

Everything else is one project to many (like contributions) or many to many.


#18

okay, there are multiple contributions per project, and you want to show these multiple contributions on separate rows, with several (10 by my count) attributes for each contribution...

so why would you want to repeat all the other stuff in the query on every one of those those multiple rows?

i have no idea how you plan to display everything in the front end, but the front end code to handle this must be pretty gruesome

please consider splitting your query into two -- not only will the queries be simpler, but the front end most probably will be too

first query gets one row per project, with project information

SELECT projects.id
     , projects.title
     , projects.description
     , projects.startdate
     , projects.enddate
     , projects.created_at
     , projects.updated_at 
     , m_institutions.name AS managinginstitutionname
     , institutions.name AS institutionname
     , s1.categories
     , s2.participating_institutions
     , s3.species
     , s4.tags
     , s5.animalgroups
     , s6.habitats
     , s7.countries
     , s8.locations
     , s9.threats
  FROM projects
INNER
  JOIN institutions 
    ON institutions.id = projects.institution_id 
LEFT 
  JOIN institutions AS m_institutions 
    ON m_institutions.id = projects.managing_organisation_id   
LEFT OUTER
  JOIN ( SELECT project_categories.project_id
              , GROUP_CONCAT( categories.category 
                     ORDER BY categories.category SEPARATOR ', ' ) AS categories
           FROM project_categories
         INNER
           JOIN categories
             ON categories.id = project_categories.category_id 
         GROUP
             BY project_categories.project_id    
       ) AS s1
    ON s1.project_id = projects.id
LEFT OUTER
  JOIN ( SELECT project_institutions.project_id
              , GROUP_CONCAT( p_institutions.name 
                     ORDER BY p_institutions.name SEPARATOR ', ' ) AS participating_institutions
           FROM project_institutions
         INNER
           JOIN institutions AS p_institutions
             ON p_institutions.id = project_institutions.institution_id
         GROUP
             BY project_institutions.project_id    
       ) AS s2
    ON s2.project_id = projects.id
LEFT OUTER
  JOIN ( SELECT project_taxonomy.project_id
              , GROUP_CONCAT( taxonomy.commonname 
                     ORDER BY taxonomy.commonname SEPARATOR ', ' ) AS species
           FROM project_taxonomy
         INNER
           JOIN taxonomy
             ON taxonomy.id = project_taxonomy.taxonomy_id
         GROUP
             BY project_taxonomy.project_id    
       ) AS s3
    ON s3.project_id = projects.id
LEFT OUTER
  JOIN ( SELECT project_tags.project_id
              , GROUP_CONCAT( tags.tag 
                     ORDER BY tags.tag SEPARATOR ', ' ) AS tags
           FROM project_tags
         INNER
           JOIN tags
             ON tags.id = project_tags.tag_id
         GROUP
             BY project_tags.project_id    
       ) AS s4
    ON s4.project_id = projects.id
LEFT OUTER
  JOIN ( SELECT project_animalgroups.project_id
              , GROUP_CONCAT( animalgroups.animalgroup 
                     ORDER BY animalgroups.animalgroup SEPARATOR ', ' ) AS animalgroups
           FROM project_animalgroups
         INNER
           JOIN animalgroups
             ON animalgroups.id = project_animalgroups.animalgroup_id
         GROUP
             BY project_animalgroups.project_id    
       ) AS s5
    ON s5.project_id = projects.id
LEFT OUTER
  JOIN ( SELECT project_habitats.project_id
              , GROUP_CONCAT( habitats.habitat 
                     ORDER BY habitats.habitat SEPARATOR ', ' ) AS habitats
           FROM project_habitats
         INNER
           JOIN habitats
             ON habitats.id = project_habitats.habitat_id
         GROUP
             BY project_habitats.project_id    
       ) AS s6
    ON s6.project_id = projects.id
LEFT OUTER
  JOIN ( SELECT project_countries.project_id
              , GROUP_CONCAT( countries.country 
                     ORDER BY countries.country SEPARATOR ', ' ) AS countries
           FROM project_countries
         INNER
           JOIN countries
             ON countries.code = project_countries.country
         GROUP
             BY project_countries.project_id    
       ) AS s7
    ON s7.project_id = projects.id    
LEFT OUTER
  JOIN ( SELECT project_locations.project_id
              , GROUP_CONCAT( locations.location 
                     ORDER BY locations.location SEPARATOR ', ' ) AS locations
           FROM project_locations
         INNER
           JOIN locations
             ON locations.id = project_locations.location_id
         GROUP
             BY project_locations.project_id    
       ) AS s8
    ON s8.project_id = projects.id    
LEFT OUTER
  JOIN ( SELECT project_threats.project_id
              , GROUP_CONCAT( threats.threat 
                     ORDER BY threats.threat SEPARATOR ', ' ) AS threats
           FROM project_threats
         INNER
           JOIN threats
             ON threats.id = project_threats.threat_id
         GROUP
             BY project_threats.project_id    
       ) AS s9
    ON s9.project_id = projects.id    
ORDER 
    BY projects.id ASC

second query gets multiple contributions rows per project

SELECT projects.id
     , contributions.year
     , c_institution.name AS contributername
     , institutionyears.visitors AS visitor_number
     , institutionyears.budget AS operational_budget
     , CASE contributions.supporttype 
            WHEN 1 THEN 'Financial' 
            WHEN 2 THEN 'Manhours' 
        END AS supporttype 
     , CASE contributions.fromoperationalbudget 
            WHEN 0 THEN 'no' 
            WHEN 1 THEN 'yes' 
            ELSE 'unknown'
        END AS fromoperationalbudget
     , contributions.amount
     , contributions.currency
     , contributions.manhours
     , contributions.details
  FROM projects
LEFT 
  JOIN projectcontributions AS contributions 
    ON contributions.project_id = projects.id 
LEFT 
  JOIN institutions AS c_institution 
    ON c_institution.id = contributions.institution_id 
LEFT 
  JOIN institutionyears 
    ON institutionyears.institution_id = c_institution.id 
ORDER 
    BY projects.id ASC
     , contributions.year ASC

#19

Hi Rudy,

that's right, there are multiple contributions per project. In the results I want each of the contributions per project on a seperate row. And each project has many attributes, which I also want in the result rows.

The reason I have everything in one huge query is that it's needed for an export of the database. The export is used to open in excel or something, to be able to manually look at things, do calculations on the contributions, etc. Not sure what the client does with everything. They just want to play with the data I quess. If they needed something specific it would be better if I wrote a specific query in the application.

So it's not displayed anywhere on the front end of a website, which would indeed be horrible.

Your idea to split them up looks nice and might be more logical. If the client wants to know something about the attributes of each project, they can use the results of the first query. If they need to know something about the contributions, they should look at the results of the second query.

They perform much better also (90ms and 1ms).

Really appreciate your input. Teaches me a lot!


#20

If I may ask, is there a short answer to the question why the "FROM clause derived table" is better than a "SELECT clause correlated subquery"? I tried finding the answer, but keep getting posts about correlated subqueries in other situations.

The reason I wrote those SELECT subqueries in the first place, was because I started with "normal" joins which ended up being horribly slow. The SELECT subqueries made things a lot faster. But apparently your FROM clause derived table is even better. I wonder why.

If it's not easy to explain, no problem, I'll try to do more research.