SitePoint Sponsor

User Tag List

Results 1 to 5 of 5
  1. #1
    SitePoint Wizard Zaggs's Avatar
    Join Date
    Feb 2005
    Posts
    1,051
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Very slow MySQL query

    Hi Guys!

    I have a MySQL query which takes quite a while to load. This is very frustrating for users when they do a search as it can take up to 10 seconds for the page to load. When I execute the function in phpMyAdmin it says the query took 1.34 seconds but in fact takes about 10 seconds for that page to load too.

    I also run an EXPLAIN on the query and took a screenshot of the result. Any help to speed this query up would be greatly appreciated.

    https://dl.dropbox.com/u/55641167/mi...2008.24.20.png

    Code:
    SELECT jobs.job_title, jobs.job_url, jobs.job_salary, jobs.job_ote, jobs.job_perks, jobs.job_description, jobs.date_posted, companies.company_name, companies.company_url, companies.logo_thumbnail, count( applications.id ) AS applications, count( featured_jobs.id ) AS featured, l.display_name, l.postcode, l.display_county
    FROM jobs
    INNER JOIN companies ON companies.id = jobs.company_id
    INNER JOIN jobs_industries ON jobs_industries.job_id = jobs.id
    INNER JOIN jobs_job_types ON jobs_job_types.job_id = jobs.id
    INNER JOIN (
    
    SELECT job_id, if( count( job_id ) >1, 'Locations throughout the UK', display_name ) AS display_name, postcode, display_county, location_id
    FROM jobs_locations
    GROUP BY job_id
    ) AS l ON l.job_id = jobs.id
    LEFT OUTER JOIN applications ON applications.job_id = jobs.id
    LEFT OUTER JOIN featured_jobs ON featured_jobs.job_id = jobs.id
    WHERE jobs.status = 'on'
    GROUP BY jobs.id
    ORDER BY featured DESC , date_posted DESC

  2. #2
    SitePoint Guru
    Join Date
    Nov 2004
    Location
    England
    Posts
    698
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Generically, the places to focus on an Explain result are where:
    • There are no possible keys to use
    • There are a large number of rows returned
    • Temporary tables and filesort are used
    • Derived results from subqueries


    Your indexes are stored in an efficient fashion, usually in order, whereas your data could be in any random order. Imagine a large book; you would need to read the whole thing to know that you've definitely found the information that you're looking for, whereas if you use the index you will be able to go straight to the term you need and won't need to look any further than the pages that it tells you to. So, where there's no possible key available, you can consider adding one. Personally I consider it good practice to have an index on any field that is used in a join, though in reality it may depend on the size of the table etc for whether there's any real-world benefit.

    When you get 100 rows returned, and you then go further down and see another 100 rows returned (especially as a result of outer joins) you will multiply the number of original rows by the number of new rows. So 100x100 = 10,000 rows. When you've got 20,000 x 6 you will end up with 120,000 rows of data for the server to work through. Looking at your result, and if I'm not mistaken, you are returning a total of 2,387,775,606 rows. This is a HUGE amount of data that the server won't be able to keep in memory (depending on your configuration) so it will write it to a temporary table on disc and perform a filesort. Disc based operations are MUCH more costly than memory based, so it becomes much slower immediately. Large numbers of results, grouping and ordering generally always result in "Using temporary; Using filesort".

    Lastly, and probably the primary reason for your particular problem (apologies for going the long way around) and the reason that you have so many rows returned as above, is that subqueries are just painfully slow in MySQL (usually). I don't believe that the jobs_locations join is doing you any favours at all, because as above, every time it appears to be returning 20,000 rows again. Is there any reason why you've used a subquery? Could you optimise it into a simple join? I would be focussing my attention there, personally

  3. #3
    SitePoint Wizard Zaggs's Avatar
    Join Date
    Feb 2005
    Posts
    1,051
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Antnee View Post
    Generically, the places to focus on an Explain result are where:
    • There are no possible keys to use
    • There are a large number of rows returned
    • Temporary tables and filesort are used
    • Derived results from subqueries


    Your indexes are stored in an efficient fashion, usually in order, whereas your data could be in any random order. Imagine a large book; you would need to read the whole thing to know that you've definitely found the information that you're looking for, whereas if you use the index you will be able to go straight to the term you need and won't need to look any further than the pages that it tells you to. So, where there's no possible key available, you can consider adding one. Personally I consider it good practice to have an index on any field that is used in a join, though in reality it may depend on the size of the table etc for whether there's any real-world benefit.

    When you get 100 rows returned, and you then go further down and see another 100 rows returned (especially as a result of outer joins) you will multiply the number of original rows by the number of new rows. So 100x100 = 10,000 rows. When you've got 20,000 x 6 you will end up with 120,000 rows of data for the server to work through. Looking at your result, and if I'm not mistaken, you are returning a total of 2,387,775,606 rows. This is a HUGE amount of data that the server won't be able to keep in memory (depending on your configuration) so it will write it to a temporary table on disc and perform a filesort. Disc based operations are MUCH more costly than memory based, so it becomes much slower immediately. Large numbers of results, grouping and ordering generally always result in "Using temporary; Using filesort".

    Lastly, and probably the primary reason for your particular problem (apologies for going the long way around) and the reason that you have so many rows returned as above, is that subqueries are just painfully slow in MySQL (usually). I don't believe that the jobs_locations join is doing you any favours at all, because as above, every time it appears to be returning 20,000 rows again. Is there any reason why you've used a subquery? Could you optimise it into a simple join? I would be focussing my attention there, personally
    Hi there!

    Thanks for your detailed response. It all makes complete sense. I did try and write a simple join for this query but it didn't seem to work (can't remember why now though) that's why I changed it to use a sub query.

    I have also been told that creating a temporary table, selecting the results and then deleting the temporary table may be a better choice, but I don't see why this would speed the query up?

    I am totally lost at the moment

  4. #4
    SitePoint Guru
    Join Date
    Nov 2004
    Location
    England
    Posts
    698
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    A colleague was once given the same advice. I'd heard it plenty of times so suggested we both tackle his slow query; he did it through a manual temporary table creation etc, and I did it the simple way around by just optimising the query as best as I could. At the end of the day my solution was faster, because MySQL said to itself "I've got a lot of info here - too much to process in RAM - so I'm going to create a temporary table (Using temporary), put the data in there, sort/group (Using filesort) and then return the results".

    I'll be honest with you; with the database actually in front of me I was always excellent at optimising queries, but just looking at the query itself and an EXPLAIN result is not so easy for me, so I can't give you an actual solution. I WOULD however 100% suggest that you focus on that join. Another solution (if you're using PHP) would be to run that subquery first, store the results in a PHP array, and then look it up as the results come out of the main query. When we switched from InnoDB to NDB Cluster we had to break up loads of queries that worked fine in to multiple queries due to the way that NDB works. I spent years thinking I needed to get EVERYTHING out with one query, when it was often counter-intuitive. Just don't nest queries inside PHP loops (ie for each row, run another query).

    Alternatively, get that subquery to only return one result (implement a WHERE clause in it). It'll still be slower than it could be, but it'll be a lot faster than it is

  5. #5
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,263
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by Antnee View Post
    ...and if I'm not mistaken, you are returning a total of 2,387,775,606 rows.
    no, i think you read that wrong if you simply multiply 19949 * 6 * 19949

    quick disclaimer: i'm no EXPLAIN expert either

    however, you're on the right track in multiplying row counts together

    i call this problem cross join effects and it results when you combine several one-to-many relationships in the joins

    let's analyze the joins --
    Code:
      FROM jobs
    INNER 
      JOIN companies 
        ON companies.id = jobs.company_id
    if you look at the join columns, you can assume that each job has exactly one company

    that's because jobs.company_id is a foreign key, referencing the companies.id primary key

    it's a one-to-many relationship, from companies to jobs, but seen from the point of view of the jobs table, each job has one company

    however,
    Code:
    INNER 
      JOIN jobs_industries 
        ON jobs_industries.job_id = jobs.id
    this relationship is reversed -- it's one-to-many from jobs to industries

    so each job can have multiple industries
    Code:
    INNER 
      JOIN jobs_job_types 
        ON jobs_job_types.job_id = jobs.id
    another one-to-many relationship -- each job can have multiple job types

    so if a given job has 4 industries and 3 job types, there will be 12 intermediate rows produced by these joins

    in other words, cross join effects

    now look at the subquery for job locations

    this is another one-to-many relationship (each job can have multiple locations), however, the subquery has GROUP BY job_id so in effect this subquery will only ever join one additional row to the jobs row

    i actually helped write this subquery last december, and i notice my advice to use CASE and MIN(display_name) was not adopted... sigh

    carrying on with the one-to-many analysis, you can see that applications and featured-jobs are also contributing to the cross join effects

    what to do?

    any relationship that has multiple rows per job should be re-written into a subquery with a GROUP BY on the job
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •