SitePoint Sponsor

User Tag List

Results 1 to 8 of 8
  1. #1
    SitePoint Wizard
    Join Date
    May 2003
    Location
    Berlin, Germany
    Posts
    1,829
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Problem with slow query / indexing strategies in Mysql

    Hey guys,

    I have a problem with a query - it takes over 30 seconds to execute and I have to fix that.

    My table has 4 million records, 18 fields (one of which is account_id) and this is the query I'd like to do:

    Code:
    SELECT id, ok, created, account_id, error FROM the_table WHERE account_id = "some_string" ORDER BY created desc LIMIT 0, 850;
    There are 120 000 records that match the query according to explain. Here is the output of explain:

    Code:
    explain SELECT id, ok, created, account_id, error FROM the_table WHERE account_id = "some_string" ORDER BY created desc LIMIT 0, 850;
    
    1	SIMPLE	the_table	ref	account_id,account_id-ok-created,account_id-created	account_id-created	33	const	119900	Using where
    As you see I have indices over account_id and account_id-created. The latter is used as "key", but it's still taking way too long to execute. I have tried adding a date range to the mix (with a long date interval), but it doesn't help:

    Code:
    explain SELECT id, ok, created, account_id, error FROM the_table WHERE account_id = "some_string" and created >= '2012-06-10 20:00:00' ORDER BY created desc LIMIT 0, 850;
    
    1	SIMPLE	the_table	ref	account_id,created,account_id-ok-created,account_id-created	account_id-created	33	const	119900	Using where
    What can I do? Any help is greatly appreciated.

    Kind regards

  2. #2
    SitePoint Enthusiast
    Join Date
    Jul 2007
    Location
    San Sebastian, Spain
    Posts
    93
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    What I would first of all do is to analyze the table to make sure that the statistics are accurate.

    Also if you have an index on account_id it appears to be picked up while processing. However, the index value will then need to read back to the data block to pick up the rest of the data. This will happen for 120,000 entries and this will make things slower. There is another option which is to create an index not only on account_id but also on id, ok, created and error. What this does is that all the data it needs exist in the index so it will not need to read the actual data block containing your data. This should give you some improvements in performance in your SELECT statement but may add overhead in INSERT, UPDATE OR DELETE's.

  3. #3
    SitePoint Wizard
    Join Date
    May 2003
    Location
    Berlin, Germany
    Posts
    1,829
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hey rcashell,

    thanks for your reply.

    I analyzed the table, the stats are accurate.

    I am fine with any optimizations that adds overhead for Inserts/Updates and Deletes. This is a read-only-slave and it's fine if the latency between master and the slave is a bit higher (it's 0.25seconds right now).

    Not sure I understand why an index over id, ok, created and error would help. I always thought it's the where clause + order by that takes advantage of indeces. I will give it a shot though.

    There is no other optimization? For example converting account_id (which is a 32 char uuid) into integers?

  4. #4
    SitePoint Wizard
    Join Date
    May 2003
    Location
    Berlin, Germany
    Posts
    1,829
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I have created a backup of the table with only the 10000 most recent rows. Then added the index over id, ok, created, error like you suggested.

    Then ran explain again:

    Code:
    explain SELECT id, ok, created, error FROM the_table WHERE account_id = "some_string" ORDER BY created desc LIMIT 0, 850;
    
    1	SIMPLE	the_table	ref	account_id,account_id-ok-created	account_id	33	const	1	Using where; Using filesort
    Should the result be different?

  5. #5
    SitePoint Enthusiast
    Join Date
    Jul 2007
    Location
    San Sebastian, Spain
    Posts
    93
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Let me explain, the index should be over account_id, id, ok, created and error. The way indexes work in MySQL is that once the index identifies a row, it returns a pointer to the data block and record. So for each record that is read using the index you have, it then needs to read the data block in order to get all the fields returned in the SELECT statement. However, if you create an index on account_id, id, ok, created and error (all the fields that you use in the WHERE clause and all the fields you return in your SELECT statement) then there is no need for it to get the read the data block as it has all the information that is needed.

  6. #6
    SitePoint Wizard
    Join Date
    May 2003
    Location
    Berlin, Germany
    Posts
    1,829
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Okay, thanks for the explanation.

    I have done that and this is the explain output now (the new index is account_id_2):

    Code:
    1	SIMPLE	the_table	ref	account_id,account_id-ok-created,account_id_2	account_id	33	const	1	Using where; Using filesort
    I am going to add the index on the original table now to see if it speeds up row retrieval.

  7. #7
    SitePoint Wizard
    Join Date
    May 2003
    Location
    Berlin, Germany
    Posts
    1,829
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Okay now locally it retrieves the rows in 800ms.

    Going to try that on the production system tomorrow and then let you know if it worked.

  8. #8
    SitePoint Wizard
    Join Date
    May 2003
    Location
    Berlin, Germany
    Posts
    1,829
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Got it to load within 4 seconds now on the production box, which is acceptable.

    Thanks a lot for the help!


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •