SitePoint Sponsor

User Tag List

Results 1 to 22 of 22
  1. #1
    SitePoint Wizard Zaggs's Avatar
    Join Date
    Feb 2005
    Posts
    1,051
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Search is painfully slow

    Hi Guys!

    I'm building a search on one of my sites and I have the following MySQL query. It's currently taking 9.5 sec in phpMyadmin to execute the code, but actually takes even longer when run on the actual site. Do I have the right database structure of tables etc.

    PHP Code:
    SELECT users . * , ui.industryjt.job_type
        FROM users
    LEFT OUTER JOIN 
    (
        
    SELECT user_idjob_type
            FROM users_job_types
            LEFT JOIN job_types ON job_types
    .id users_job_types.job_type_id
        GROUP BY user_id
    ) AS jt ON jt.user_id users.id
    LEFT OUTER JOIN 
    (
        
    SELECT user_idindustry
            FROM users_industries
            LEFT JOIN industries ON industries
    .id users_industries.industry_id
        GROUP BY user_id
    ) AS ui ON ui.user_id users.id
    WHERE users
    .status '1'
    AND users.cv_hide '0'
    ORDER BY cv_date DESC 
    Below are the create statements:
    PHP Code:
    CREATE TABLE IF NOT EXISTS `users` (
      `
    idint(11NOT NULL AUTO_INCREMENT,
      `
    first_namevarchar(100CHARACTER SET utf8 NOT NULL,
      `
    cv_datedatetime DEFAULT NULL,
      `
    cv_hideenum('0','1'CHARACTER SET utf8 NOT NULL DEFAULT '0',
      `
    statusenum('0','1'CHARACTER SET utf8 DEFAULT '1',
      
    PRIMARY KEY (`id`)
    ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=1292692 ;

    CREATE TABLE IF NOT EXISTS `job_types` (
      `
    idsmallint(6NOT NULL AUTO_INCREMENT,
      `
    job_typevarchar(100NOT NULL,
      `
    job_type_descriptiontext,
      `
    job_type_urlvarchar(255NOT NULL,
      
    PRIMARY KEY (`id`)
    ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=1006 ;

    CREATE TABLE IF NOT EXISTS `users_job_types` (
      `
    user_idint(11NOT NULL,
      `
    job_type_idint(11NOT NULL,
      
    KEY `user_id` (`user_id`),
      
    KEY `job_type_id` (`job_type_id`)
    ENGINE=MyISAM DEFAULT CHARSET=latin1;

    CREATE TABLE IF NOT EXISTS `users_industries` (
      `
    user_idint(11NOT NULL,
      `
    industry_idint(11NOT NULL,
      
    KEY `user_id` (`user_id`),
      
    KEY `industry_id` (`industry_id`)
    ENGINE=MyISAM DEFAULT CHARSET=latin1;

    CREATE TABLE IF NOT EXISTS `industries` (
      `
    idsmallint(6NOT NULL AUTO_INCREMENT,
      `
    industryvarchar(100NOT NULL,
      `
    industry_descriptiontext,
      `
    industry_urlvarchar(255NOT NULL,
      
    PRIMARY KEY (`id`)
    ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=1052 
    Please let me know if you know any way this can be speeded up.

    Thanks.

  2. #2
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,338
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    your junction tables should be redesigned to enable the use of covering indexes -- the primary key plus another index with the same two columns in the reverse order
    Code:
    CREATE TABLE users_job_types 
    ( user_id INTEGER NOT NULL
    , job_type_id INTEGER NOT NULL
    , PRIMARY KEY ( user_id , job_type_id )
    , INDEX job_type_user ( job_type_id , user_id )
    );
    
    CREATE TABLE users_industries 
    ( user_id INTEGER NOT NULL
    , industry_id INTEGER NOT NULL
    , PRIMARY KEY ( user_id , industry_id )
    , INDEX industry_user ( industry_id , user_id )
    );
    this way, the actual rows of these two tables are never accessed, just the indexes

    as for your query, how come all you want is a single arbitrary job type and a single arbitrary industry for each user?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  3. #3
    SitePoint Wizard Zaggs's Avatar
    Join Date
    Feb 2005
    Posts
    1,051
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    your junction tables should be redesigned to enable the use of covering indexes -- the primary key plus another index with the same two columns in the reverse order
    Code:
    CREATE TABLE users_job_types 
    ( user_id INTEGER NOT NULL
    , job_type_id INTEGER NOT NULL
    , PRIMARY KEY ( user_id , job_type_id )
    , INDEX job_type_user ( job_type_id , user_id )
    );
    
    CREATE TABLE users_industries 
    ( user_id INTEGER NOT NULL
    , industry_id INTEGER NOT NULL
    , PRIMARY KEY ( user_id , industry_id )
    , INDEX industry_user ( industry_id , user_id )
    );
    this way, the actual rows of these two tables are never accessed, just the indexes

    as for your query, how come all you want is a single arbitrary job type and a single arbitrary industry for each user?
    Hi r937,

    Firstly, thank you for your post. Your comments make absolute complete sense.

    I've added the new indexes / primary keys to the table. Now with the query itself, the aim is to return users who match a certain job type or industry. I've stripped that part of the sql out for now, so this in essence is the query that will be run when a user does a blanket search (i.e. they have no search criteria). Ideally, this query should return all users. I'm joining the tables as I need to display the industry and job type on the search results page. Does this make sense?

  4. #4
    SitePoint Wizard Zaggs's Avatar
    Join Date
    Feb 2005
    Posts
    1,051
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    as for your query, how come all you want is a single arbitrary job type and a single arbitrary industry for each user?
    Ps, the results should return all industries / job types for each user (not just one)

  5. #5
    SitePoint Wizard guelphdad's Avatar
    Join Date
    Oct 2003
    Location
    St. Catharines, ON Canada
    Posts
    1,708
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Code:
    SELECT user_id, job_type
            FROM users_job_types
            LEFT JOIN job_types ON job_types.id = users_job_types.job_type_id
        GROUP BY user_id
    


    This part of your query alone will return a single job type per user_id and it is arbitrary which one it picks. If there are five rows in the table for you and your id=3 and two rows in the table for me and my id=4 it will return any one of your rows and any one of my rows.

  6. #6
    SitePoint Wizard Zaggs's Avatar
    Join Date
    Feb 2005
    Posts
    1,051
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by guelphdad View Post
    This part of your query alone will return a single job type per user_id and it is arbitrary which one it picks. If there are five rows in the table for you and your id=3 and two rows in the table for me and my id=4 it will return any one of your rows and any one of my rows.
    Yes, this is really not ideal, I guess i'll have to do some type of sub query?

  7. #7
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,338
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by Zaggs View Post
    I guess i'll have to do some type of sub query?
    you're already doing a subquery

    in fact, you're doing two of them, and you will get cross join efects if you return more than one row per user with them

    didn't you already use GROUP_CONCAT in this query, or another very similar?

    because that's what's needed here
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  8. #8
    SitePoint Wizard Zaggs's Avatar
    Join Date
    Feb 2005
    Posts
    1,051
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    you're already doing a subquery

    in fact, you're doing two of them, and you will get cross join efects if you return more than one row per user with them

    didn't you already use GROUP_CONCAT in this query, or another very similar?

    because that's what's needed here
    Yes, I used GROUP_CONCAT on a similar query - not this one.

    I guess I could do this then, but it still works slow;

    PHP Code:
    SELECT users . * , ui.industryjt.job_type
    FROM users
    INNER JOIN 
    (

    SELECT user_idgroup_concatjob_type
    SEPARATOR 
    ' ' ) AS job_type
    FROM users_job_types
    LEFT JOIN job_types ON job_types
    .id users_job_types.job_type_id
    GROUP BY user_id
    ) AS jt ON jt.user_id users.id
    INNER JOIN 
    (

    SELECT user_idgroup_concatindustry
    SEPARATOR 
    ' ' ) AS industry
    FROM users_industries
    LEFT JOIN industries ON industries
    .id users_industries.industry_id
    GROUP BY user_id
    ) AS ui ON ui.user_id users.id
    WHERE users
    .status '1'
    AND users.cv_hide '0'
    ORDER BY cv_date DESC 
    1) How do I speed it up using my new indexes that I added?
    2) How can I return all users even if they don't have any industries or job types?

  9. #9
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,338
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    1. do an EXPLAIN and let's analyze it

    2. use LEFT OUTER JOINs instead of INNER JOINs

    edit: and use INNER JOINs inside your subqueries instead of LEFT OUTER JOINs
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  10. #10
    SitePoint Wizard Zaggs's Avatar
    Join Date
    Feb 2005
    Posts
    1,051
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    1. do an EXPLAIN and let's analyze it

    2. use LEFT OUTER JOINs instead of INNER JOINs

    edit: and use INNER JOINs inside your subqueries instead of LEFT OUTER JOINs
    Here is an explain of the new query

    animal_query.png

  11. #11
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,338
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    i'm not a performance expert but perhaps the ORDER BY in the outer query is contributing to the slowness

    this is just a guess, but the query might speed up really nicely if you pull in all the job type and industries separately ahead of time, and do the GROUP_CONCAT on the ids only (i.e. use just the junction tables in the subqueries), expanding the ids to names in your application language -- but this would be a last resort
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  12. #12
    SitePoint Wizard Zaggs's Avatar
    Join Date
    Feb 2005
    Posts
    1,051
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    i'm not a performance expert but perhaps the ORDER BY in the outer query is contributing to the slowness

    this is just a guess, but the query might speed up really nicely if you pull in all the job type and industries separately ahead of time, and do the GROUP_CONCAT on the ids only (i.e. use just the junction tables in the subqueries), expanding the ids to names in your application language -- but this would be a last resort
    Hi,

    I tried removing the "order by" in the outer query but it didn't speed things up at all.

    When you say pull in the job type and industries seperately, what exactly do you mean? Create separate queries?

  13. #13
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,338
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by Zaggs View Post
    When you say pull in the job type and industries seperately, what exactly do you mean? Create separate queries?
    yup

    then your main query would look like this --
    Code:
    SELECT users.* 
         , jt.job_type_ids 
         , ui.industry_ids
      FROM users 
    LEFT OUTER 
      JOIN ( SELECT user_id
                  , GROUP_CONCAT(job_type_id) AS job_type_ids 
               FROM users_job_types 
             GROUP 
                 BY user_id ) AS jt 
        ON jt.user_id = users.id 
    LEFT OUTER 
      JOIN ( SELECT user_id
                  , GROUP_CONCAT(industry_id) AS industry_ids 
               FROM users_industries 
             GROUP 
                 BY user_id ) AS ui 
        ON ui.user_id = users.id 
     WHERE users.status = '1' 
       AND users.cv_hide = '0' 
    ORDER 
        BY cv_date DESC
    hopefully that runs faster, and you would then swap out the ids using the pre-fetched job types and industries using your application language
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  14. #14
    SitePoint Wizard Zaggs's Avatar
    Join Date
    Feb 2005
    Posts
    1,051
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    yup

    then your main query would look like this --
    Code:
    SELECT users.* 
         , jt.job_type_ids 
         , ui.industry_ids
      FROM users 
    LEFT OUTER 
      JOIN ( SELECT user_id
                  , GROUP_CONCAT(job_type_id) AS job_type_ids 
               FROM users_job_types 
             GROUP 
                 BY user_id ) AS jt 
        ON jt.user_id = users.id 
    LEFT OUTER 
      JOIN ( SELECT user_id
                  , GROUP_CONCAT(industry_id) AS industry_ids 
               FROM users_industries 
             GROUP 
                 BY user_id ) AS ui 
        ON ui.user_id = users.id 
     WHERE users.status = '1' 
       AND users.cv_hide = '0' 
    ORDER 
        BY cv_date DESC
    hopefully that runs faster, and you would then swap out the ids using the pre-fetched job types and industries using your application language
    Hi Paul,

    That query is taking a while too...Showing rows 0 - 29 (7,404 total, Query took 17.0853 sec)

  15. #15
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,338
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    who's this paul guy?

    can you please reconfirm the indexes on the junction tables by doing a SHOW CREATE TABLE for them
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  16. #16
    SitePoint Wizard Zaggs's Avatar
    Join Date
    Feb 2005
    Posts
    1,051
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    who's this paul guy?

    can you please reconfirm the indexes on the junction tables by doing a SHOW CREATE TABLE for them
    Ooops, sorry it's been a long day. Think I was typing an email to a guy named Paul prior to this being posted.

    PHP Code:
    CREATE TABLE IF NOT EXISTS `users_industries` (
      `
    user_idint(11NOT NULL,
      `
    industry_idint(11NOT NULL,
      
    PRIMARY KEY (`user_id`,`industry_id`),
      
    KEY `industry_user` (`industry_id`,`user_id`)
    ENGINE=MyISAM DEFAULT CHARSET=utf8;

    CREATE TABLE IF NOT EXISTS `users_job_types` (
      `
    user_idint(11NOT NULL,
      `
    job_type_idint(11NOT NULL,
      
    PRIMARY KEY (`user_id`,`job_type_id`),
      
    KEY `job_type_user` (`job_type_id`,`user_id`)
    ENGINE=MyISAM DEFAULT CHARSET=utf8

  17. #17
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,338
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    let's step back from this for a moment, maybe there's another approach

    under what circumstances do you want to produce a list of all users?

    why does that list need to show each user's job types and industries?

    how often will this query need to run? and have you thought about caching?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  18. #18
    SitePoint Wizard Zaggs's Avatar
    Join Date
    Feb 2005
    Posts
    1,051
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    let's step back from this for a moment, maybe there's another approach

    under what circumstances do you want to produce a list of all users?
    We use PHP to build an SQL query depending on what the users search criteria is. For example, if they select a job type, then the query will narrow by job type. The query I posted just happens to be a blanket search (i.e. the user didn't have any search criteria and therefore all results should be returned).

    Quote Originally Posted by r937 View Post
    why does that list need to show each user's job types and industries?
    Because we need to see the Job Type and Industry next to the user on the search results page.

    Quote Originally Posted by r937 View Post
    how often will this query need to run? and have you thought about caching?
    It will run every time someone does a search, so probably quite a lot.

    Does this help? Thanks.

  19. #19
    SitePoint Wizard Zaggs's Avatar
    Join Date
    Feb 2005
    Posts
    1,051
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Still looking for help on this issue. Can anyone help?

    Thanks.

  20. #20
    SitePoint Wizard bronze trophy
    Join Date
    Jul 2006
    Location
    Augusta, Georgia, United States
    Posts
    4,187
    Mentioned
    17 Post(s)
    Tagged
    4 Thread(s)
    Perhaps try eliminating the subqueries.

    Code SQL:
    SELECT 
           u.id
          ,GROUP_CONCAT(DISTINCT jt.job_type SEPARATOR ' ')
          ,GROUP_CONCAT(DISTINCT i.industry SEPARATOR ' ')
      FROM
          users u
      LEFT OUTER
      JOIN  
          users_job_types ujt
        ON
          u.id = ujt.user_id
      LEFT OUTER
      JOIN
          job_types jt
        ON
          ujt.job_type_id = jt.id
      LEFT OUTER
      JOIN
          users_industries ui
        ON
          u.id = ui.user_id
      LEFT OUTER
      JOIN
          industries i
        ON
          ui.industry_id = i.id
     WHERE
          u.`status` = 1
       AND
          u.cv_hide = 0
     GROUP
        BY
          u.id
     ORDER
        BY
          u.cv_date DESC

    The explain for that should be using all the indexes.
    The only code I hate more than my own is everyone else's.

  21. #21
    SitePoint Wizard bronze trophy
    Join Date
    Jul 2006
    Location
    Augusta, Georgia, United States
    Posts
    4,187
    Mentioned
    17 Post(s)
    Tagged
    4 Thread(s)
    Looking at it more I think removing the file sorts is the main problem. There is no need to group or concat the data until the outermost query. I would be skeptical and have to test though whether you would be getting a significant performance boost using subqueries w/ inner joins or just join all the tables with left outer. Particularly when it comes to MySQL.

    Code SQL:
    SELECT 
           u.*
          ,GROUP_CONCAT(DISTINCT ujt.job_type SEPARATOR ' ')
          ,GROUP_CONCAT(DISTINCT ui.industry SEPARATOR ' ')
      FROM
          users u
      LEFT OUTER
      JOIN 
         (SELECT
                ujt.user_id
               ,jt.job_type
            FROM
               users_job_types ujt
           INNER
            JOIN
               job_types jt
              ON
               ujt.job_type_id = jt.id) ujt
        ON
          u.id = ujt.user_id
      LEFT OUTER
      JOIN
          (SELECT
                ui.user_id
                ,i.industry
             FROM
                users_industries ui
            INNER
             JOIN
                industries i
               ON
                ui.industry_id = i.id) ui
        ON
          u.id = ui.user_id
     WHERE
          u.STATUS = 1
       AND
          u.cv_hide = 0
     GROUP
        BY
          u.id
     ORDER
        BY
          u.cv_date DESC
    The only code I hate more than my own is everyone else's.

  22. #22
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,338
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by oddz View Post
    Perhaps try eliminating the subqueries.
    that will produce an inefficient proliferation of cross join intermediate rows, which you conveniently hide by using DISTINCT in the GROUP_CONCAT

    i don't see that working any better
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •