SitePoint Sponsor

User Tag List

Results 1 to 14 of 14
  1. #1
    Barefoot on the Moon! silver trophy Force Flow's Avatar
    Join Date
    Jul 2003
    Location
    Northeastern USA
    Posts
    4,615
    Mentioned
    56 Post(s)
    Tagged
    1 Thread(s)

    Optimize a MySQL Query

    I was working on MySQL query that I finally got working, but although it returns the correct results, it seems fairly inefficient. I was wondering if there might be a better way to do this.

    Here is the sample query:

    Code:
    SELECT DISTINCT `contact`.*
    FROM `contact`
    JOIN `contactSessions` ON `contactSessions`.`CUID`=`contact`.`CUID`
    WHERE `contactSessions`.`SID`=51
    
    UNION
    
    SELECT `contact`.*
    FROM `contact`
    WHERE
    `contact`.`CUID` IN
    (
        SELECT DISTINCT `contactRelationship`.`superiorUID`
        FROM `contactRelationship`
        LEFT JOIN `contact` ON `contactRelationship`.`subordinateUID`=`contact`.`CUID`
        WHERE `contactRelationship`.`subordinateUID` IN 
            (
                SELECT DISTINCT `contact`.`CUID`
                FROM `contact`
                JOIN `contactSessions` ON `contactSessions`.`CUID`=`contact`.`CUID`
                WHERE `contactSessions`.`SID`=51
            )
    )
    The two `contactSessions`.`SID` WHERE clauses will always be identical. The SID number is the variable value of the whole query.

    Basically, what I'm trying to do is select any contacts with a specific SID. Then, also select any parents/superiors when children/subordinates are found with a specific SID. There wouldn't necessarily be a record for the specific SID and the CUID in the `contactSessions` table for the parent/superior.

    So, here's a longer explanation of the query:

    What the first part of the query does is select all the `contact` records for any contacts that appear in the `contactSessions` table with a specific SID.

    The second part (after the UNION) selects all the `contact` CUIDs for any CUIDs that appear in the `contactSessions` table with a specific SID. Then, when the CUID appears as a child (subordinateUID) in `contactRelationship` table, the parent's CUIDs (superiorUIDs) are returned. Then, the results from the `contact` table are returned when the superiorUID matches the CUID.


    Below are the CREATE statements for the tables and sample data.

    Code:
    -- -----------------------------------------------------
    -- Table `Contact`
    -- -----------------------------------------------------
    CREATE TABLE IF NOT EXISTS `Contact` (
      `CUID` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
      `firstName` VARCHAR(255) NULL,
      `lastName` VARCHAR(255) NULL,
      PRIMARY KEY (`CUID`),
      UNIQUE INDEX `UID_UNIQUE` (`CUID` ASC))
    ENGINE = InnoDB;
    
    
    -- -----------------------------------------------------
    -- Table `ContactRelationship`
    -- -----------------------------------------------------
    CREATE TABLE IF NOT EXISTS `ContactRelationship` (
      `superiorUID` BIGINT UNSIGNED NOT NULL,
      `subordinateUID` BIGINT UNSIGNED NOT NULL,
      PRIMARY KEY (`superiorUID`, `subordinateUID`),
      INDEX `fk_ContactRelationship_Contact_idx` (`superiorUID` ASC),
      INDEX `fk_ContactRelationship_Contact1_idx` (`subordinateUID` ASC),
      CONSTRAINT `fk_ContactRelationship_Contact`
        FOREIGN KEY (`superiorUID`)
        REFERENCES `Contact` (`CUID`)
        ON DELETE CASCADE
        ON UPDATE NO ACTION,
      CONSTRAINT `fk_ContactRelationship_Contact1`
        FOREIGN KEY (`subordinateUID`)
        REFERENCES `Contact` (`CUID`)
        ON DELETE CASCADE
        ON UPDATE NO ACTION)
    ENGINE = InnoDB;
    
    
    
    
    -- -----------------------------------------------------
    -- Table `ContactSessions`
    -- -----------------------------------------------------
    CREATE TABLE IF NOT EXISTS `ContactSessions` (
      `CUID` BIGINT UNSIGNED NOT NULL,
      `SID` INT UNSIGNED NOT NULL,
      INDEX `fk_ContactSubscriptions_Contact1_idx` (`CUID` ASC),
      PRIMARY KEY (`CUID`, `SID`),
      INDEX `fk_ContactSessions_Sessions1_idx` (`SID` ASC),
      CONSTRAINT `fk_ContactSubscriptions_Contact1`
        FOREIGN KEY (`CUID`)
        REFERENCES `Contact` (`CUID`)
        ON DELETE CASCADE
        ON UPDATE NO ACTION)
    ENGINE = InnoDB;
    
    
    INSERT INTO `contact` (`CUID`, `firstName`, `lastName`) VALUES (1, 'John', 'Doe');
    INSERT INTO `contact` (`CUID`, `firstName`, `lastName`) VALUES (2, 'Jane', 'Doe');
    INSERT INTO `contact` (`CUID`, `firstName`, `lastName`) VALUES (3, 'Bob', 'Smith');
    INSERT INTO `contact` (`CUID`, `firstName`, `lastName`) VALUES (4, 'Bill', 'Johnson');
    INSERT INTO `contact` (`CUID`, `firstName`, `lastName`) VALUES (5, 'Little Kelly', 'Johnson');
    INSERT INTO `contact` (`CUID`, `firstName`, `lastName`) VALUES (6, 'Little Jimmy', 'Doe');
    INSERT INTO `contact` (`CUID`, `firstName`, `lastName`) VALUES (7, 'Little Jenny', 'Doe');
    INSERT INTO `contact` (`CUID`, `firstName`, `lastName`) VALUES (8, 'Little Lizzy', 'Doe');
    
    
    INSERT INTO `ContactRelationship` (`superiorUID`, `subordinateUID`) VALUES (1,6);
    INSERT INTO `ContactRelationship` (`superiorUID`, `subordinateUID`) VALUES (1,7);
    INSERT INTO `ContactRelationship` (`superiorUID`, `subordinateUID`) VALUES (1,8);
    INSERT INTO `ContactRelationship` (`superiorUID`, `subordinateUID`) VALUES (2,6);
    INSERT INTO `ContactRelationship` (`superiorUID`, `subordinateUID`) VALUES (2,7);
    INSERT INTO `ContactRelationship` (`superiorUID`, `subordinateUID`) VALUES (2,8);
    INSERT INTO `ContactRelationship` (`superiorUID`, `subordinateUID`) VALUES (4,5);
    
    
    INSERT INTO `ContactSessions` (`CUID`, `SID`) VALUES (8,50);
    INSERT INTO `ContactSessions` (`CUID`, `SID`) VALUES (8,51);
    INSERT INTO `ContactSessions` (`CUID`, `SID`) VALUES (8,52);
    INSERT INTO `ContactSessions` (`CUID`, `SID`) VALUES (8,53);
    INSERT INTO `ContactSessions` (`CUID`, `SID`) VALUES (6,51);
    INSERT INTO `ContactSessions` (`CUID`, `SID`) VALUES (6,52);
    INSERT INTO `ContactSessions` (`CUID`, `SID`) VALUES (5,51);
    Visit The Blog | Follow On Twitter
    301tool 1.1.5 - URL redirector & shortener (PHP/MySQL)
    Can be hosted on and utilize your own domain

  2. #2
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,327
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    efficiency and complexity are only vaguely connected

    a complex query such as yours can still be quite efficient

    have you run and analyzed the EXPLAIN?

    p.s. for the love of all that's holy, don't use those horrid backticks in your query sql -- bad enough that people generate their dump scripts with phpmyadmin's "show those horrid backticks" option turned on
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  3. #3
    Barefoot on the Moon! silver trophy Force Flow's Avatar
    Join Date
    Jul 2003
    Location
    Northeastern USA
    Posts
    4,615
    Mentioned
    56 Post(s)
    Tagged
    1 Thread(s)
    Well, I'm not sure that the EXPLAIN actually tells me all that much.

    query explain.jpg

    The thing is that I use an identical query twice. Once before the union and again in a subquery. I thought there might be a way to store and re-use the results without having to query the same thing twice.
    Visit The Blog | Follow On Twitter
    301tool 1.1.5 - URL redirector & shortener (PHP/MySQL)
    Can be hosted on and utilize your own domain

  4. #4
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,327
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    well, you have three instances of "using temporary" which doesn't look good

    if i had a few hours, i'd get to know your data a little better and see if i could find another way

    one thing that i'd try right away is to mremove the DISTINCT from your subqueries and see if that helps

    the reason is, the following two code snippets are logically equivalent --
    Code:
    WHERE foo IN ( 9, 3, 7 )
    Code:
    WHERE foo IN ( 9, 3, 3, 9, 7, 7, 3 )
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  5. #5
    Barefoot on the Moon! silver trophy Force Flow's Avatar
    Join Date
    Jul 2003
    Location
    Northeastern USA
    Posts
    4,615
    Mentioned
    56 Post(s)
    Tagged
    1 Thread(s)
    It looks like my sample data set is too small to determine if there is a significant performance difference. There seems to be only a .0001 sec difference between leaving out DISTINCT and leaving it in.
    Visit The Blog | Follow On Twitter
    301tool 1.1.5 - URL redirector & shortener (PHP/MySQL)
    Can be hosted on and utilize your own domain

  6. #6
    SitePoint Guru bronze trophy
    Join Date
    Dec 2003
    Location
    Poland
    Posts
    930
    Mentioned
    7 Post(s)
    Tagged
    0 Thread(s)
    WHERE col IN(SELECT ...) subqueries are generally poorly optimized by MySQL, maybe rewriting them as joins would help?

    What version of MySQL are you using? In 5.6 there were some optimizations done to these types of subqueries so you might try upgrading if possible.

  7. #7
    Barefoot on the Moon! silver trophy Force Flow's Avatar
    Join Date
    Jul 2003
    Location
    Northeastern USA
    Posts
    4,615
    Mentioned
    56 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by Lemon Juice View Post
    WHERE col IN(SELECT ...) subqueries are generally poorly optimized by MySQL, maybe rewriting them as joins would help?
    I'm all for any suggestions if you see a way where that could be accomplished. With the JOINs I tried, there would always be results missing.

    What version of MySQL are you using? In 5.6 there were some optimizations done to these types of subqueries so you might try upgrading if possible.
    The live server is running 5.5
    Visit The Blog | Follow On Twitter
    301tool 1.1.5 - URL redirector & shortener (PHP/MySQL)
    Can be hosted on and utilize your own domain

  8. #8
    SitePoint Guru bronze trophy
    Join Date
    Dec 2003
    Location
    Poland
    Posts
    930
    Mentioned
    7 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Force Flow View Post
    I'm all for any suggestions if you see a way where that could be accomplished. With the JOINs I tried, there would always be results missing.
    It's difficult for me to rewrite your query without possibly spending a lot of time on it since I'm not that proficient in it...

    But there's another idea worth considering - how many rows are usually returned by your subqueries? If these are not large numbers of IDs (I mean not more than 1000 or so) then I'm sure you can gain a lot in terms of performance if you split this query into 3 - first execute the innermost (last) SELECT, grab the data and execute the next SELECT higher up injecting the data from the first query into the IN() part as literal values. Then do the same with the next subquery higher up. This is how I optimized this type of subqueries in the past using PHP and it worked pretty well. If you want to do it in SQL alone then you could select your IDs into a temporary table and then execute the parent SELECT with a simple join with the temp table - this should also work fast and possibly could be wrapped up in a stored procedure.

    Edit: I can see that the last idea with the temp table gets quite close to rewriting this query using joins so perhaps this could be worth experimenting, here is just an example of rewriting the last subquery:
    Code:
    SELECT DISTINCT `contact`.*
    FROM `contact`
    JOIN `contactSessions` ON `contactSessions`.`CUID`=`contact`.`CUID`
    WHERE `contactSessions`.`SID`=51
    
    UNION
    
    SELECT `contact`.*
    FROM `contact`
    WHERE
    `contact`.`CUID` IN
    (
        SELECT DISTINCT `contactRelationship`.`superiorUID`
        FROM `contactRelationship`
        LEFT JOIN `contact` ON `contactRelationship`.`subordinateUID`=`contact`.`CUID`
    
        INNER JOIN
            (
                SELECT DISTINCT `contact`.`CUID`
                FROM `contact`
                JOIN `contactSessions` ON `contactSessions`.`CUID`=`contact`.`CUID`
                WHERE `contactSessions`.`SID`=51
            ) AS dt
        ON dt.CUID=contactRelationship.subordinateUID
    )
    If this works then it's just a matter of applying the same idea to the other subquery. It's untested but I'd give it a try.

  9. #9
    Barefoot on the Moon! silver trophy Force Flow's Avatar
    Join Date
    Jul 2003
    Location
    Northeastern USA
    Posts
    4,615
    Mentioned
    56 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by Lemon Juice View Post
    But there's another idea worth considering - how many rows are usually returned by your subqueries? If these are not large numbers of IDs (I mean not more than 1000 or so)
    There would probably be anywhere between 500 and 4,000 rows in the contact table. I would guess that the other two tables would have slightly fewer rows than the contact table.

    I'll give your query a try and see what I get. Thanks
    Visit The Blog | Follow On Twitter
    301tool 1.1.5 - URL redirector & shortener (PHP/MySQL)
    Can be hosted on and utilize your own domain

  10. #10
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,327
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    okay, i took a closer look at simplifying the subqueries

    let's start at the innermost subquery

    first of all, you want to return contact.CUID... but this is going to equal contactSessions.CUID, which you get by searching for contactSessions.SID... so you do not need the join to the contact table!

    now, moving to the next outer subquery, you want contactSessions.CUID to be equal to contactRelationship.subordinateUID... and once again, you do not need the join to the contact table, because once you've retrieved the contactRelationship row, you have contactRelationship.superiorUID

    so let's rewrite the second of your two UNION queries like this --
    Code:
    SELECT contact.*
      FROM contactSessions
    INNER
      JOIN contactRelationship
        ON contactRelationship.subordinateUID = contactSessions.CUID
    INNER
      JOIN contact
        ON contact.CUID = contactRelationship.superiorUID    
     WHERE contactSessions.SID = 51
    notice that the FROM clause starts with the most restricted table, the one for which you supply a WHERE condition value

    mysql is smart enough to re-arrange tables in a multi-table join, but i am a firm believer in writing a query in the manner in which i expect it to be executed

    now we combine with the first of your two UNION queries, eliminating the DISTINCT because that's what UNION (as opposed to UNION ALL) will do for you --
    Code:
    SELECT contact.*
      FROM contactSessions
    INNER
      JOIN contact
        ON contact.CUID = contactSessions.CUID  
     WHERE contactSessions.SID = 51
    
    UNION
    
    SELECT contact.*
      FROM contactSessions
    INNER
      JOIN contactRelationship
        ON contactRelationship.subordinateUID = contactSessions.CUID
    INNER
      JOIN contact
        ON contact.CUID = contactRelationship.superiorUID    
     WHERE contactSessions.SID = 51
    run that, and let's have a look at the EXPLAIN
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  11. #11
    Barefoot on the Moon! silver trophy Force Flow's Avatar
    Join Date
    Jul 2003
    Location
    Northeastern USA
    Posts
    4,615
    Mentioned
    56 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by r937 View Post
    first of all, you want to return contact.CUID... but this is going to equal contactSessions.CUID, which you get by searching for contactSessions.SID... so you do not need the join to the contact table!
    That didn't even occur to me to use a different table in the FROM clause and join the contact table at the end instead. It looks like that was pretty much the key to the whole thing.

    run that, and let's have a look at the EXPLAIN
    Here it is:
    query explain v2.gif

    Thanks Rudy Looks like the processing time for the query was cut in half.
    Visit The Blog | Follow On Twitter
    301tool 1.1.5 - URL redirector & shortener (PHP/MySQL)
    Can be hosted on and utilize your own domain

  12. #12
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,327
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by Force Flow View Post
    That didn't even occur to me to use a different table in the FROM clause and join the contact table at the end instead. It looks like that was pretty much the key to the whole thing.
    well, no

    like i tried to explain, you can use inner joins on multiple tables in pretty much any sequence, and mysql will figure out the most efficient access path, invariably starting with the table that has the highest restriction, i.e. will return the fewest rows, thus ensuring that only the minimum number of rows have to be joined (rather than joining tables completely and only then applying the WHERE condition, in effect retrieving a lot and then throwing a whole bunch of joined rows away)

    the improvement came about from two things -- converting subqueries to joins (which is actually how mysql executes subqueries, but, for some reason, it doesn't always do it efficiently), and, more importantly, not joining to additional tables if you don't have to
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  13. #13
    SitePoint Guru bronze trophy
    Join Date
    Dec 2003
    Location
    Poland
    Posts
    930
    Mentioned
    7 Post(s)
    Tagged
    0 Thread(s)
    I am impressed with the solution Rudy has come to, I knew something like that could be done, I just couldn't wrap my head around it to be able to transform the query to such an optimized state

    Yes, the sequence of inner joins doesn't matter but I've experienced on a few occasions a situation where a left join performed faster than an inner join - I don't know why, especially that I would have expected inner joins to be faster. Therefore, in cases where the inner join were equal to the left join (that is no NULLs in the result) simply switching to the left join improved performance. But I suspect this can depend on many variables so I'm not sure if left joins are always faster...

  14. #14
    Barefoot on the Moon! silver trophy Force Flow's Avatar
    Join Date
    Jul 2003
    Location
    Northeastern USA
    Posts
    4,615
    Mentioned
    56 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by r937 View Post
    the improvement came about from two things -- converting subqueries to joins (which is actually how mysql executes subqueries, but, for some reason, it doesn't always do it efficiently), and, more importantly, not joining to additional tables if you don't have to
    Right, the logic of the query basically got flipped around. Sorry, that's what I was trying to say earlier.

    Quote Originally Posted by Lemon Juice View Post
    I am impressed with the solution Rudy has come to, I knew something like that could be done, I just couldn't wrap my head around it to be able to transform the query to such an optimized state
    Indeed, same here.

    Thanks again Rudy
    Visit The Blog | Follow On Twitter
    301tool 1.1.5 - URL redirector & shortener (PHP/MySQL)
    Can be hosted on and utilize your own domain


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •