Ok, I realize I’m being very general here and the statement may not be true, but in the database we are working with, UNIONS seems to be much faster than LEFT JOINs when using junction tables.
Okay, I just realized that what’s causing the difference is that when using JOINS I am using ORs which I think are what’s causing the speed difference.
Does anyone know of a better way to write the following code without using UNIONs?:
SELECT SQL_CALC_FOUND_ROWS DISTINCT dvdpedia.dogTag, dvdpedia.title, dvdpedia.starring, dvdpedia.originalTitle, dvdpedia.theatrical, dvdpedia.media, localeLanguage, MATCH(title) AGAINST('jones' IN BOOLEAN MODE) AS relevance
, director
FROM dvdpedia
LEFT OUTER JOIN dvd_director2title ON dvdpedia.dogTag = dvd_director2title.dogTag
LEFT OUTER JOIN dvd_director ON dvd_director.id = dvd_director2title.pId
WHERE MATCH(title) AGAINST('jones' IN BOOLEAN MODE)
or MATCH(director) AGAINST('jones' IN BOOLEAN MODE)
or MATCH(starring) AGAINST('jones' IN BOOLEAN MODE)
GROUP BY dogTag
ORDER BY localeLanguage = 'en' DESC, localeLanguage, relevance DESC
title, director, and starring and not the types of data fields that are conducive to fulltext indexing
i doubt any of these three is actually longer than 50 characters, so you should be probably be using LIKE
secondly, you write LEFT OUTER JOIN but then you also have a condition on the right table director column
i’m having trouble understanding why you have a many-to-many realtionship table for dvds to directors, and ~not~ one for dvds to “starring” (if i think this is what it looks like, the people starring in the dvd)
finally, i think your use of DISTINCT is redundant and may also be contributing to poor performance
My understanding is that regardless of the length of the field, a FULLTEXT search is usually faster than a LIKE search, is this not so? Also the advantage of a FULLTEXT search on those fields is that then we can order by Relevance.
secondly, you write LEFT OUTER JOIN but then you also have a condition on the right table director column
Sorry I don’t understand what you mean here, the LEFT OUTER JOIN is so that we can get listings of movies without directors in case they are searching for titles (this being a keyword search, we don’t know if they’re searching for titles, directors or actors).
i’m having trouble understanding why you have a many-to-many realtionship table for dvds to directors, and ~not~ one for dvds to “starring” (if i think this is what it looks like, the people starring in the dvd)
I do have one for credits as well, it’s just that the 6 main actors go into that starring field to make thing easier. Then I have another query that only searches the actors table which has the many-to-many relationship.
finally, i think your use of DISTINCT is redundant and may also be contributing to poor performance
Aha, hadn’t caught that, great, at least one thing I can optimize
While we are talking about optimization, is there a way to find out if my joins are not using the indexes properly? I have tried using EXPLAIN but that doesn’t seem to help. I’ve also tried turning on the slow query logs and parsing those with mysqldumpslow but although this helps track the time the queries take, it doesn’t tell me if the joins are being used properly or where I could be optimizing the queries.
Let me clarify, EXPLAIN loses me when it reports something like:
id select_type table type possible_keys key
1 PRIMARY <derived2> All
I have no idea how to interpret or optimize this, even where in my query it’s referring to.
EXPLAIN
SELECT dvdpedia.*
, dvd_rated.rated
, d.director
, w.writer
, p.producer
, DATE_FORMAT(dvdpedia.theatrical, '%Y-%m-%d') AS 'theatrical'
, DATE_FORMAT(dvd_locale.releaseDate, '%Y-%m-%d') AS 'releaseDate'
, dvd_aspectRatio.aspectRatio
, l.languages
, dvd_locale.region
, dvd_locale.locale
, dvd_locale.price
, dvd_locale.videoFormat
, dvd_images.upc
, dvd_pictureFormat.pictureFormat
, dvd_studio.studio
, dvd_sound.sound
, dvd_genre.genre
, dvd_series.series
, s.subtitles
, c.country
FROM dvdpedia
LEFT OUTER JOIN dvd_locale ON dvdpedia.dogTag = dvd_locale.dogTag AND dvd_locale.locale = 'UK'
LEFT OUTER JOIN dvd_studio ON dvdpedia.studioId = dvd_studio.id
LEFT OUTER JOIN dvd_pictureFormat ON dvdpedia.pictureFormatId = dvd_pictureFormat.id
LEFT OUTER JOIN dvd_images ON dvd_images.dogTag = dvdpedia.dogTag
LEFT OUTER JOIN dvd_rated ON dvd_rated.id = dvd_locale.ratedId
LEFT OUTER JOIN dvd_aspectRatio ON dvdpedia.aspectRatioId = dvd_aspectRatio.id
LEFT OUTER JOIN dvd_sound ON dvdpedia.soundId = dvd_sound.id
LEFT OUTER JOIN dvd_genre ON dvdpedia.genreId = dvd_genre.id
LEFT OUTER JOIN dvd_series ON dvdpedia.seriesId = dvd_series.id
LEFT OUTER JOIN ( SELECT dvd_writer2title.dogTag
, GROUP_CONCAT(writer SEPARATOR ', ') AS writer
FROM dvd_writer2title
INNER JOIN dvd_writer
ON dvd_writer.id = dvd_writer2title.pId
GROUP BY dvd_writer2title.dogTag ) AS w
ON w.dogTag = dvdpedia.dogTag
LEFT OUTER JOIN ( SELECT dvd_director2title.dogTag,
GROUP_CONCAT(director SEPARATOR ', ') AS director
FROM dvd_director
INNER JOIN dvd_director2title
ON dvd_director.id = dvd_director2title.pid
GROUP BY dvd_director2title.dogTag ) AS d
ON d.dogTag = dvdpedia.dogTag
LEFT OUTER JOIN ( SELECT dvd_producer2title.dogTag,
GROUP_CONCAT(producer SEPARATOR ', ') AS producer
FROM dvd_producer
INNER JOIN dvd_producer2title
ON dvd_producer.id = dvd_producer2title.pid
GROUP BY dvd_producer2title.dogTag ) AS p
ON p.dogTag = dvdpedia.dogTag
LEFT OUTER JOIN (SELECT dvd_languages2title.dogTag, GROUP_CONCAT(languages.languages SEPARATOR ', ' ) AS languages
FROM languages
INNER JOIN dvd_languages2title
ON dvd_languages2title.pId = languages.id
GROUP BY dvd_languages2title.dogTag) AS l
ON l.dogTag = dvdpedia.dogTag
LEFT OUTER JOIN (SELECT dvd_subtitles2title.dogTag, GROUP_CONCAT(languages.languages SEPARATOR ', ' ) AS subtitles
FROM languages
INNER JOIN dvd_subtitles2title
ON dvd_subtitles2title.pId = languages.id
GROUP BY dvd_subtitles2title.dogTag) AS s
ON s.dogTag = dvdpedia.dogTag
LEFT OUTER JOIN (SELECT dvd_country2title.dogTag, GROUP_CONCAT(country SEPARATOR ', ') AS country
FROM country
INNER JOIN dvd_country2title
ON country.id = dvd_country2title.pId
GROUP BY dvd_country2title.dogTag) AS c
ON c.dogTag = dvdpedia.dogTag
WHERE dvdpedia.dogTag = '40';
Well it’s not issues as such, we are just trying to optimize as much as we can and of course when we see things in EXPLAIN that do not use indexes they seem ripe candidates for optimization. My suspicion is that it’s these queries here:
LEFT OUTER JOIN ( SELECT dvd_writer2title.dogTag
, GROUP_CONCAT(writer SEPARATOR ', ') AS writer
FROM dvd_writer2title
INNER JOIN dvd_writer
ON dvd_writer.id = dvd_writer2title.pId
GROUP BY dvd_writer2title.dogTag ) AS w
ON w.dogTag = dvdpedia.dogTag
causing the no-index situation. We are rewriting them like this now:
, GROUP_CONCAT(DISTINCT writer SEPARATOR ', ') AS writer
Then the simple JOIN by itself like the others. This produces a much better EXPLAIN output with all SIMPLE joins and only one empty value in “possible_keys” but the type is “index” there so I think we should be fine no?
And I have to say that it really bugs me that a UNION, which is faster, spits out a join of type ALL which is evil according to the docs and yet the equivalent JOIN, which is slower, does not, the join types are all good:
+----+--------------+------------+----------+------------------------+------------------------+---------+------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+------------+----------+------------------------+------------------------+---------+------+------+-----------------------------+
| 1 | PRIMARY | dvdpedia | fulltext | title_fulltext | title_fulltext | 0 | | 1 | Using where; Using filesort |
| 2 | UNION | dvdpedia | fulltext | originalTitle_fulltext | originalTitle_fulltext | 0 | | 1 | Using where; Using filesort |
| NULL | UNION RESULT | <union1,2> | ALL | NULL | NULL | NULL | NULL | NULL | Using filesort |
+----+--------------+------------+----------+------------------------+------------------------+---------+------+------+-----------------------------+
you mean the concept? it’s really rather simple – each SELECT retrieves result rows separately from all other SELECTs, and all the result rows are fed into the same result table
consequently each SELECT must return exactly the same number of columns, and the columns have to be “union compatible” i.e. similar datatypes – string with strings, numbers with numbers, dates with dates
i don’t know of any resources which explain the EXPLAIN for unions, though