SitePoint Sponsor

User Tag List

Results 1 to 10 of 10

Hybrid View

  1. #1
    Making a better wheel silver trophy DR_LaRRY_PEpPeR's Avatar
    Join Date
    Jul 2001
    Location
    Missouri
    Posts
    3,428
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    why doesn't MySQL use the index for sorting?

    this sucks. i just realized that MySQL isn't using the index for the ORDER BY when i really wish it would.

    examples: these use the index:

    Code:
    mysql> EXPLAIN SELECT * FROM users ORDER BY userid LIMIT 10;
    +---------+-------+---------------+---------+---------+------+------+-------+
    | table   | type  | possible_keys | key     | key_len | ref  | rows | Extra |
    +---------+-------+---------------+---------+---------+------+------+-------+
    | users   | index | NULL          | PRIMARY |       2 | NULL |  527 |       |
    +---------+-------+---------------+---------+---------+------+------+-------+
    
    mysql> EXPLAIN SELECT * FROM users ORDER BY userid DESC LIMIT 10;
    +---------+-------+---------------+---------+---------+------+------+-------+
    | table   | type  | possible_keys | key     | key_len | ref  | rows | Extra |
    +---------+-------+---------------+---------+---------+------+------+-------+
    | users   | index | NULL          | PRIMARY |       2 | NULL |  527 |       |
    +---------+-------+---------------+---------+---------+------+------+-------+
    
    mysql> EXPLAIN SELECT * FROM users WHERE userid > 100 ORDER BY userid LIMIT 10;
    +---------+-------+---------------+---------+---------+------+------+------------+
    | table   | type  | possible_keys | key     | key_len | ref  | rows | Extra      |
    +---------+-------+---------------+---------+---------+------+------+------------+
    | users   | range | PRIMARY       | PRIMARY |       2 | NULL |  527 | where used |
    +---------+-------+---------------+---------+---------+------+------+------------+
    
    mysql> EXPLAIN SELECT * FROM users WHERE userid < 100 ORDER BY userid LIMIT 10;
    +---------+-------+---------------+---------+---------+------+------+------------+
    | table   | type  | possible_keys | key     | key_len | ref  | rows | Extra      |
    +---------+-------+---------------+---------+---------+------+------+------------+
    | users   | range | PRIMARY       | PRIMARY |       2 | NULL |   78 | where used |
    +---------+-------+---------------+---------+---------+------+------+------------+
    great. these don't however:

    Code:
    mysql> EXPLAIN SELECT * FROM users WHERE userid < 100 ORDER BY userid DESC LIMIT 10;
    +---------+-------+---------------+---------+---------+------+------+----------------------------+
    | table   | type  | possible_keys | key     | key_len | ref  | rows | Extra                      |
    +---------+-------+---------------+---------+---------+------+------+----------------------------+
    | users   | range | PRIMARY       | PRIMARY |       2 | NULL |   78 | where used; Using filesort |
    +---------+-------+---------------+---------+---------+------+------+----------------------------+
    
    mysql> EXPLAIN SELECT * FROM users WHERE userid > 100 ORDER BY userid DESC LIMIT 10;
    +---------+-------+---------------+---------+---------+------+------+----------------------------+
    | table   | type  | possible_keys | key     | key_len | ref  | rows | Extra                      |
    +---------+-------+---------------+---------+---------+------+------+----------------------------+
    | users   | range | PRIMARY       | PRIMARY |       2 | NULL |  527 | where used; Using filesort |
    +---------+-------+---------------+---------+---------+------+------+----------------------------+
    Using filesort. i could almost understand the last one (which is the kind i want to use the index), but i still don't get why it won't use the index for `ORDER BY key DESC' when you have a range and it does for just `ORDER BY key'.

    i guess there's not really an answer, but i wanted to see if anybody had anything to say. would it work in other databases?


    oh BTW, kind of a related question. when you want to sort in the order rows were INSERTed, is it OK to use the AUTO_INCREMENT PRIMARY KEY since it should always be "in order?" or is that bad and you should ORDER BY a timestamp, etc.? on a message board, for example, the latest posts should always have the highest ID, so is it OK to ORDER BY that? seems all right to me, since "older" deleted IDs shouldn't be reused. let me know.
    - Matt ** Ignore old signature for now... **
    Dr.BB - Highly optimized to be 2-3x faster than the "Big 3."
    "Do not enclose numeric values in quotes -- that is very non-standard and will only work on MySQL." - MattR

  2. #2
    Making a better wheel silver trophy DR_LaRRY_PEpPeR's Avatar
    Join Date
    Jul 2001
    Location
    Missouri
    Posts
    3,428
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    just looking in the changelog and found this for 4.0.0:

    ORDER BY ... DESC can now use keys.
    although, as my second query above shows, it does appear to currently use the index for a simple `ORDER BY key DESC' without a WHERE.

    and for 4.0.2:

    Use index for ORDER BY in queries of type: SELECT * FROM t WHERE key_part1=1 ORDER BY key_part1 DESC,key_part2 DESC

    so that appears to be the answer. why didn't they implement this in 3.23?





    still want opinions on sorting by an AUTO_INCREMENT column, though.

  3. #3
    Database Jedi MattR's Avatar
    Join Date
    Jan 2001
    Location
    buried in the database shell (Washington, DC)
    Posts
    1,107
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The sorting works in other DBs just fine. Order by autoincrement is fine as long as you can't edit the datestamp.

  4. #4
    Making a better wheel silver trophy DR_LaRRY_PEpPeR's Avatar
    Join Date
    Jul 2001
    Location
    Missouri
    Posts
    3,428
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    i was hoping you'd reply.

    Originally posted by MattR
    Order by autoincrement is fine as long as you can't edit the datestamp.
    excellent, thanks.

  5. #5
    Database Jedi MattR's Avatar
    Join Date
    Jan 2001
    Location
    buried in the database shell (Washington, DC)
    Posts
    1,107
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    LOL glad I could help. What Sybase and MS SQL server do depend on whether or not you declare your index as:
    CREATE INDEX user_userid ON user( userid )
    or
    CREATE INDEX user_userid ON user( userid DESC )

    For an ORDER BY DESC:
    case 1)
    The index is actually read in reverse. This isn't as efficient as the index being ordered decending and then being read sequentially forward.

    case 2)
    Index is decending, read from start to finish.

    Sybase and MS SQL Server have the concept of a 'clustered' index (Oracle has an analagous structure called a grouped table). Basically it sorts your data rows by a particular key (or keys) so that the rows do not have to be sorted in memory.

    It also improves heavily on BETWEEN clauses or grouping.

    For instance, if you were designing a forum system in Syb/MS SQL/Oracle you would do something like this:
    Code:
    CREATE TABLE thread(
      threadid INT IDENTITY, -- ident is same as auto_increment
      forumid  INT NOT NULL REFERENCES forum( forumid ),
      lastpost INT NOT NULL
    )
    
    CREATE CLUSTERED INDEX thread_forumid_lastpost ON ( forumid, lastpost DESC )
    
    CREATE TABLE post(
      postid INT IDENTITY,
      posttext, etc.
    )
    
    CREATE CLUSTERED INDEX post_threadid_postid ON post( threadid, postid )
    So the thread rows are actually physically sorted by forumid, lastpost desc. Meaning it will arrage threads on disk in the same physical location as the default vBulletin thread view (shows you the threads in the forum with the last post highest). It saves a significant amount of I/O to have the threads pre-sorted.

    If you are really interested in index binary tree layout you also save a level of binary tree since the clustered index leaf nodes are the data rows! So your indexes are 1 level smaller which is more efficient to search and store in memory.

    The reason why I included postid in the post_threadid_postid index was that without it, there would have to be a sort (because new rows are not necessarily added to the end of the table) to ORDER BY postid ASC.

  6. #6
    Making a better wheel silver trophy DR_LaRRY_PEpPeR's Avatar
    Join Date
    Jul 2001
    Location
    Missouri
    Posts
    3,428
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    cool. too bad MySQL doesn't support those clustered indexes. just curious, doesn't that make updates to the table a bit slower since the whole data file has to be rearranged instead of just the index? even so, i guess it's offset by the faster read speed.

  7. #7
    Database Jedi MattR's Avatar
    Join Date
    Jan 2001
    Location
    buried in the database shell (Washington, DC)
    Posts
    1,107
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Remember that in the b+ tree you can have many data elements in a node. So the only time the tree needs to adjust to an insert (an update on a clustered key is somewhat like a delete followed by an insert) is when the leaf node is full or the row is too big to fit on the remaining space.

    If it is full, then yes there is a small performance hit as the single leaf node is split in two (called page splitting). Pointers are readjusted and life is good. So there are small drawbacks if you insert a lot and have wide rows (or really low rows per page -- which is a configurable option for an index).

    I can't recall exactly how much I/O is required to do a page split; it of course depends on how large your leaf nodes are and how many rows there are on a page. The bigger the node and the more rows the more costly it will be to split, although if the rows are tiny (2K or less) it can read many of them in a big chunk and be less costly than you'd think. In short, the cost depends on your situation -- the motto here is know your data.

    Typically you make primary keys things which do not change often (username, postid, threadid, etc.), although it does happen.
    Last edited by MattR; Jun 1, 2002 at 17:22.

  8. #8
    Database Jedi MattR's Avatar
    Join Date
    Jan 2001
    Location
    buried in the database shell (Washington, DC)
    Posts
    1,107
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    If you want to know a little bit about Sybase-centric (and MS SQL as well; as of MS 7 they haven't changed the way pages split and I think 2000 is the same as well) splitting works you can go here:
    Docs

  9. #9
    Making a better wheel silver trophy DR_LaRRY_PEpPeR's Avatar
    Join Date
    Jul 2001
    Location
    Missouri
    Posts
    3,428
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    thanks for the info once again. i appreciate it.

  10. #10
    SitePoint Addict
    Join Date
    Mar 2001
    Posts
    249
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Does this help with sequential sorting on disk?


    http://dev.mysql.com/doc/mysql/en/Index_preloading.html


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •