SitePoint Sponsor

User Tag List

Page 2 of 2 FirstFirst 12
Results 26 to 34 of 34
  1. #26
    SitePoint Evangelist
    Join Date
    Sep 2006
    Posts
    428
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    This is very interesting to me.... I wasn't aware that MySQL could only utilize 1 key per select.... hmmm....

    It's going to take a bit to wrap my head around integrating this into my query as my remaining OR's are created dynamically so rather than just tacking some OR's on to the query I would need it to generate another SELECT statement and a UNION.

    For now, I am very satisfied with my performance gains just from switching from many OR's to one additional JOIN.

  2. #27
    Programming Since 1978 silver trophybronze trophy felgall's Avatar
    Join Date
    Sep 2005
    Location
    Sydney, NSW, Australia
    Posts
    16,603
    Mentioned
    24 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by ScallioXTX View Post
    So rewriting IN to series of OR's makes sense. Well, at least to me it does
    A series of OR conditions can be completely independent with different fields being tested for each.

    An IN guarantees that it is one field being tested for all the values and therefore there is at least the possibility that the actual code being run could be more efficient due to one of the values to be compared not needing to be reloaded for each.

    Whether a database actually is able to or even tries to make use of the fact that an IN could be executed more efficiently than a series of ORs by not constantly reloading the field being tested depends on the database and various other factors.

    Generically a series of OR conditions is not equivalent to an IN since a=b OR c=d does not have an equivalent IN condition. The IN clearly specifies information that requires closer examination to extract from a series of ORs.

    So the two can either be implemented the same way by databases (which it appears is usually the case) or the database could take advantage of the higher specificity of the IN statement to implement that variant in a more efficient manner. Of course a series of OR condifitions testing the same field should be implemented the same way as an IN but the potimiser has to actually look more closely at all the code to determine whether the OR conditions are or are not the equivalent of an IN before being able to implement it that way.
    Stephen J Chapman

    javascriptexample.net, Book Reviews, follow me on Twitter
    HTML Help, CSS Help, JavaScript Help, PHP/mySQL Help, blog
    <input name="html5" type="text" required pattern="^$">

  3. #28
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    8,895
    Mentioned
    138 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by felgall View Post
    An IN guarantees that it is one field being tested for all the values and therefore there is at least the possibility that the actual code being run could be more efficient due to one of the values to be compared not needing to be reloaded for each.
    Interesting, I hadn't thought of that. That invalidates my previous argument that DBMSs are most likely to "rewrite IN to a series of OR's" (Dan, you're right, this is not technalically the correct way to put it, but you understand my reasoning, right?).

    Quote Originally Posted by felgall View Post
    Generically a series of OR conditions is not equivalent to an IN since a=b OR c=d does not have an equivalent IN condition.
    They are not equivalent, but I think we can all argee that every IN can be written as a series of OR's, but not every series of OR's can be written as an IN.
    So it's a "one-way relation", or injection if you will.

    In the next couple of days I will dust off the book from a Database course I once took and see if that has anything to tell us about optimizing IN and/or OR. I'll keep you posted
    Rémon - Hosting Advisor

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  4. #29
    Programming Since 1978 silver trophybronze trophy felgall's Avatar
    Join Date
    Sep 2005
    Location
    Sydney, NSW, Australia
    Posts
    16,603
    Mentioned
    24 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by ScallioXTX View Post
    not every series of OR's can be written as an IN.
    Which means that IF the optimiser can generate more efficient code for a comparison using IN where it doesn't need to swap out the field value being compared then there are two possibilities of what it could do with the OR.

    1. It could not check if it is the equivalent of an IN in which case the code can't retain any field values between comparisons - which would take longer
    2. It could spend extra time up front to see if the OR is an equivalent of IN.

    Either way the OR would then take longer than the IN.

    The IN and OR can be processed the same way and therefore take the same time only if the IN is considered to be the equivalent of OR and the fact that one side of all the comparisons is the same is ignored in terms of how the processing is done.

    That's why I said right back in my first reply that IN might be faster than OR or they might take exactly the same time depending on just how the code is handled by the optimiser.

    In every case where you have a one way relationship like this between two versions of a database query there is at least a theoretical possibility of the more specific one being faster than the other.
    Stephen J Chapman

    javascriptexample.net, Book Reviews, follow me on Twitter
    HTML Help, CSS Help, JavaScript Help, PHP/mySQL Help, blog
    <input name="html5" type="text" required pattern="^$">

  5. #30
    Follow Me On Twitter: @djg gold trophysilver trophybronze trophy Dan Grossman's Avatar
    Join Date
    Aug 2000
    Location
    Philadephia, PA
    Posts
    20,580
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Since we're all just pulling things out of thin air now, is anyone interested in digging into the code behind one of the storage engines and finding out what's going on?

    When's the last time ya did an amortized analysis of the costs of operations on a b-tree?

  6. #31
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,016
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by Dan Grossman View Post
    Since we're all just pulling things out of thin air now...
    excellent comment, dan old buddy

    i hate it when threads go there...
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  7. #32
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    8,895
    Mentioned
    138 Post(s)
    Tagged
    2 Thread(s)
    Okay, so I've been reading several chapters in the book I got for a databases course at the university, but I couln't find any reference to IN in that book whatsoever.

    So I went to the MySQL documentation and found the following:
    expr IN (value, ...)
    Returns 1 if expr is equal to any of the values in the IN list, else returns 0. If all values are constants, they are evaluated according to the type of expr and sorted. The search for the item then is done using a binary search. This means IN is very quick if the IN value list consists entirely of constants.
    You can find this here.

    I did some experimental testing on a toy table, created by the SQL below, to investigate this.

    Code SQL:
    CREATE TABLE `test` (
     `id` INT(10) DEFAULT NULL,
     `some_char` CHAR(1) DEFAULT NULL,
     KEY `id_index` (`id`),
     KEY `some_char_index` (`some_char`)
    ) ENGINE=MyISAM
     
    INSERT INTO test(id,some_char) VALUES
    (1, 'j'),
    (2, 'i'),
    (3, 'h'),
    (4, 'g'),
    (5, 'f'),
    (6, 'e'),
    (7, 'd'),
    (8, 'c'),
    (9, 'b'),
    (10, 'a')

    (I first tried this table with a primary key on `id`, but then MySQL seems to sort results by `id`, which defies the purpose of what I'm about to show).

    Now consider the following query
    Code SQL:
    SELECT id,some_char FROM test WHERE some_char IN ('b','a');

    The result is
    Code:
        id  some_char
    ------  ---------
        10  a
         9  b
    And the EXPLAIN for this query is:
    Code:
        id  select_type  table   type    possible_keys    key              key_len  ref       rows  Extra      
    ------  -----------  ------  ------  ---------------  ---------------  -------  ------  ------  -----------
         1  SIMPLE       test    range   some_char_index  some_char_index  2        (NULL)       2  Using where
    So, MySQL indeed seems to have sorted the values of the IN values list ('a' is first in the result, 'b' second, in my IN values list I asked them other way around).

    However, in two seperate cases MySQL was not sorting the values.

    First case
    The query
    Code sql:
    SELECT id,some_char FROM test WHERE some_char IN ('b','a','c');

    Returns the result
    Code:
        id  some_char
    ------  ---------
         8  c        
         9  b        
        10  a
    And the EXPLAIN of this query:
    Code:
        id  select_type  table   type    possible_keys    key     key_len  ref       rows  Extra      
    ------  -----------  ------  ------  ---------------  ------  -------  ------  ------  -----------
         1  SIMPLE       test    ALL     some_char_index  (NULL)  (NULL)   (NULL)      10  Using where
    So instead of using the index, MySQL seems have to fallen back to a full table scan.
    Interestingly, when we perform the following query on the table

    Code sql:
    INSERT INTO test(id,some_char) VALUES (11,'z'), (12,'z'), (13,'z')

    To give the table a 13 tuples, and perform the same query, we get the result
    Code:
        id  some_char
    ------  ---------
        10  a        
         9  b        
         8  c
    and the EXPLAIN of this query now is:
    Code:
        id  select_type  table   type    possible_keys    key              key_len  ref       rows  Extra      
    ------  -----------  ------  ------  ---------------  ---------------  -------  ------  ------  -----------
         1  SIMPLE       test    range   some_char_index  some_char_index  2        (NULL)       3  Using where
    So now MySQL is using the index again.
    (For 11 and 12 tuples MySQL also uses a tablescan, for >= 13 tuples it using the index again.)

    So MySQL appears to be considering whether sorting the values of the IN values list would be faster than just doing a tablescan.

    Second case
    When we drop the index `some_char_index` and perform the same queries (still with 13 tuples) we get

    Query:
    Code sql:
    SELECT id,some_char FROM test WHERE some_char IN ('b','a');

    Result:
    Code:
        id  some_char
    ------  ---------
         9  b        
        10  a
    Explain:
    Code:
        id  select_type  table   type    possible_keys  key     key_len  ref       rows  Extra      
    ------  -----------  ------  ------  -------------  ------  -------  ------  ------  -----------
         1  SIMPLE       test    ALL     (NULL)         (NULL)  (NULL)   (NULL)      13  Using where
    And for the other query.

    Query:
    Code sql:
    SELECT id,some_char FROM test WHERE some_char IN ('b','a','c');

    Result:
    Code:
        id  some_char
    ------  ---------
         8  c        
         9  b        
        10  a
    Explain:
    Code:
        id  select_type  table   type    possible_keys  key     key_len  ref       rows  Extra      
    ------  -----------  ------  ------  -------------  ------  -------  ------  ------  -----------
         1  SIMPLE       test    ALL     (NULL)         (NULL)  (NULL)   (NULL)      13  Using where

    So, when there is no index on the some_char column, MySQL does not sort the values of the IN values list and performs a tablescan.



    Now to get back to the original problem, when I rewite all IN's to OR's (so some_char IN ('a','b') becomes some_field='a' OR some_field='b') the results are EXACTLY the same.
    This would suggest (but doesn't prove) that MySQL is rewriting the queries using IN and the queries using OR's to the same plan.

    For InnoDB the results are basically the same, but instead of 13 tuples you need 17 tuples to make MySQL use the index when looking for 3 different values.

    DISCLAIMER: All these results are experimental and should be viewed as such. I'm not proving anything, just showing the results of some experiments. Everyone can draw their own conclusions.

  8. #33
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,016
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    regarding the use of indexes...

    mysql is smart enough to figure out that it's probably faster to do a table scan for a table with only a dozen rows because it can grab them all with only one physical read off the disk instead of two physical reads (index plus data)

    so if you're going to do any testing on whether indexes are used or not, make sure your table has at least a few thousand rows

    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  9. #34
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    8,895
    Mentioned
    138 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by r937 View Post
    regarding the use of indexes...

    mysql is smart enough to figure out that it's probably faster to do a table scan for a table with only a dozen rows because it can grab them all with only one physical read off the disk instead of two physical reads (index plus data)

    so if you're going to do any testing on whether indexes are used or not, make sure your table has at least a few thousand rows

    While I totally agree with this principle, my "results" show something else.
    Namely requesting IN('b','a') for the table with 10 rows does use the index.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •