SitePoint Sponsor

User Tag List

Results 1 to 12 of 12
  1. #1
    SitePoint Enthusiast
    Join Date
    Dec 2009
    Posts
    39
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    multiple column index

    As per MySQL manual:

    If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to find rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3).

    MySQL cannot use an index if the columns do not form a leftmost prefix of the index. Suppose that you have the SELECT statements shown here:

    SELECT * FROM tbl_name WHERE col1=val1;
    SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2;

    SELECT * FROM tbl_name WHERE col2=val2;
    SELECT * FROM tbl_name WHERE col2=val2 AND col3=val3;

    If an index exists on (col1, col2, col3), only the first two queries use the index. The third and fourth queries do involve indexed columns, but (col2) and (col2, col3) are not leftmost prefixes of (col1, col2, col3).
    Now, my question is how I should define the indexes if I want to execute the following queries efficiently:

    Code:
    SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2;
    SELECT * FROM tbl_name WHERE col1=val1 AND col3=val3;
    SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2 AND col3=val3;
    An index on (col1, col2, col3) doesn't work with the 2nd query. If instead I would define an index on (col1, col2) and (col1, col3) would that help the 3rd query?

  2. #2
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,248
    Mentioned
    59 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by rblon View Post
    Now, my question is how I should define the indexes if I want to execute the following queries efficiently:
    declare on index on (col1,col2,col3), which will be used by the first and third queries

    then declare an index on (col1,col3) which will be used by the second

    Quote Originally Posted by rblon View Post
    If instead I would define an index on (col1, col2) and (col1, col3) would that help the 3rd query?
    nope, mysql will use only one index per table
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  3. #3
    SitePoint Enthusiast
    Join Date
    Dec 2009
    Posts
    39
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    ok but what if the situation is somewhat more complicated... Eg there is also a col4 and col5 which is sometimes used in the SELECT statement.

    The only thing the queries have in common is that they alway start with

    Code:
    SELECT * FROM tbl_name WHERE col1=val1 AND ...
    In the "single column case" (lets imagine col1 doesn't exist) you would just put indexes on col2, col3, col4, and col5, respectively. However, in the multiple column case it seems you cannot extend this reasoning by putting indexes on (col1, col2), (col1, col3), (col1, col4), (col1, col5). But creating an index for all combinations doesn't seem a good idea either.

  4. #4
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,248
    Mentioned
    59 Post(s)
    Tagged
    3 Thread(s)
    i believe you understand the situation, yes

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  5. #5
    SitePoint Enthusiast
    Join Date
    Dec 2009
    Posts
    39
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    not yet actually.

    Quote Originally Posted by r937 View Post
    nope, mysql will use only one index per table
    I didn't know that. But then it seems, that what I called the "single column case" can be extended.

    Because in the "single column case" [-> col1 doesn't exist, index on (col2), (col3), (col4), (col5)], the following statement only uses one index:

    Code:
    SELECT * FROM tbl_name WHERE col2=val2 AND col5=val5
    which is the same in the "multiple column case" [-> index on (col1, col2), (col1, col3), (col1, col4), (col1, col5)]

    Code:
    SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2 AND col5=val5

  6. #6
    From space with love silver trophy
    SpacePhoenix's Avatar
    Join Date
    May 2007
    Location
    Poole, UK
    Posts
    5,014
    Mentioned
    103 Post(s)
    Tagged
    0 Thread(s)
    rblon, do you need all the fields of the table in the results set? If not, specify the fields in the SELECT clause. It will save resources (memory) as you want be grabbing data that is not needed.

    For example
    Say each field is approx 1MB (per record)and you have 20 fields in the table, by doing a SELECT star you'll be grabbing 20MB approx (1MB x 20 fields) per record which matchs. But say you only needed 4 fields, by specifying them in the SELECT clause instead of using SELECT * you'll only be grabbing 4MB approx (1MB x 4 fields) per record which matches.

    It might also be more efficient to specify all the fields wanted, even if you want them all, which is something that I'll have to test at some point.
    Community Team Advisor
    Forum Guidelines: Posting FAQ Signatures FAQ Self Promotion FAQ
    Help the Mods: What's Fluff? Report Fluff/Spam to a Moderator

  7. #7
    SitePoint Enthusiast
    Join Date
    Dec 2009
    Posts
    39
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    SpacePhoenix, I am aware of that. I was just extending an example from the MySQL manual about indexing.

    Quote Originally Posted by SpacePhoenix View Post
    It might also be more efficient to specify all the fields wanted, even if you want them all, which is something that I'll have to test at some point.
    Personally, in that case I would also specify the field names. It makes the program more readable.

  8. #8
    SitePoint Enthusiast
    Join Date
    Dec 2009
    Posts
    39
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    once more, regarding this statement:
    Quote Originally Posted by r937 View Post
    nope, mysql will use only one index per table
    According to the manual:
    If separate single-column indexes exist on col1 and col2, the optimizer will attempt to use the Index Merge optimization (see Section 7.2.6, “Index Merge Optimization”)
    That suggests MySQL will try to use more than one index per table. But is merging also attempted for multiple column indexes?

  9. #9
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,248
    Mentioned
    59 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by rblon View Post
    But is merging also attempted for multiple column indexes?
    i think what you should do is move away from the col1, col2, col3 hypothetical scenario, and actually run some tests using EXPLAIN to see what it actually does
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  10. #10
    SitePoint Enthusiast
    Join Date
    Dec 2009
    Posts
    39
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Ok, here we go... I have used the following test program to make two tables (similar to the example we have been discussing):

    Code:
    <?php
    // Initialize database connection
    $con = ...
    
    // Create tables
    $sql = "CREATE TABLE single (
      col2 INTEGER,
      col3 INTEGER,
      rowid INTEGER,
      INDEX (col2),
      INDEX (col3)
    ) ENGINE=MyISAM";
    
    mysql_query($sql, $con);
    
    $sql = "CREATE TABLE multiple (
      col1 INTEGER,
      col2 INTEGER,
      col3 INTEGER,
      rowid INTEGER,
      INDEX (col1, col2),
      INDEX (col1, col3)
    ) ENGINE=MyISAM";
    
    mysql_query($sql, $con);
    
    /*
     * Table single will have 100,001 rows
     * col2 is a random number between 1 and 100
     * col3 is a random number between 1 and 20
     * Table multiple is a (double) copy of single, 
     * with col1 of 1 and 2, respectively
    */
    
    $seed = 100;
    srand($seed);
    
    for ($i = 0; $i <= 100000; $i++) {
      $col2 = rand(1, 100);
      $col3 = rand(1, 19);  // Edit: should have been 20, but 19 used anyway
    
      $sql = "INSERT INTO single 
                (col2, col3, rowid) 
              VALUES 
                ($col2, $col3, $i)";
    
      mysql_query($sql, $con);
    
      $sql = "INSERT INTO multiple 
                (col1, col2, col3, rowid)
              VALUES
                (1, $col2, $col3, $i),
                (2, $col2, $col3, $i)";
    
      mysql_query($sql, $con);
    }
    ?>
    Now, the following statements each return (the same) 59 rows

    Code:
    SELECT rowid FROM single WHERE col2=71 AND col3=5;
    SELECT rowid FROM multiple WHERE col1=1 AND col2=71 AND col3=5;
    I attach a screenshot of the EXPLAIN for both statements.

    I see that the first query uses only key. This is col2, which makes sense as that returns the smallest number of rows. However, the EXPLAIN output says 1252 rows, while "SELECT rowid FROM single WHERE col2=71" only returns 1009 rows. Also, I thought for the subquery (col3=5) some kind of index merging would happen, but that doesn't seem to be the case. So I guess, MySQL is going through all 1252 (or 1009) rows.

    The second query does use index merging but I am not sure how to further interpret the EXPLAIN output. More specificially, how to read "possible_keys: col1,col1_2"?

    Any help would be much appreciated.
    Attached Images Attached Images

  11. #11
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,248
    Mentioned
    59 Post(s)
    Tagged
    3 Thread(s)
    you're definitely on the right track to becoming a mysql internals expert, a path i myself have chosen not to go down, so i don't think i can help you further

    all i know is that "possible keys" means the indexes that were available to choose from
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  12. #12
    SitePoint Enthusiast
    Join Date
    Dec 2009
    Posts
    39
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    hmm, that's a shame, but fair enough...


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •