SitePoint Sponsor

User Tag List

Page 1 of 2 12 LastLast
Results 1 to 25 of 34
  1. #1
    SitePoint Evangelist
    Join Date
    Sep 2006
    Posts
    428
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    IN -or- OR (that is the question)

    Hi Everybody!

    I was cruising through my slow query log earlier and one of my larger queries occurs in the log file occasionally. So I'm trying to work to optimize it a bit. One part of the query that I was pretty sure was going to be bad when I wrote it is a long OR statement.

    This is preceded by a couple of AND statements. There could possibly be several hundred of these 'OR's
    Code:
    ( sitedone_dealer.inventory.id = '0'  OR sitedone_dealer.inventory.id = '36'  OR sitedone_dealer.inventory.id = '58'  OR 
    sitedone_dealer.inventory.id = '59'  OR sitedone_dealer.inventory.id = '60'  OR sitedone_dealer.inventory.id = '61'  OR 
    sitedone_dealer.inventory.id = '62'  OR sitedone_dealer.inventory.id = '63'  OR sitedone_dealer.inventory.id = '64'  OR 
    sitedone_dealer.inventory.id = '65'  OR sitedone_dealer.inventory.id = '66'  OR sitedone_dealer.inventory.id = '67'  OR 
    sitedone_dealer.inventory.id = '68'  OR sitedone_dealer.inventory.id = '69'  OR sitedone_dealer.inventory.id = '70'  OR 
    sitedone_dealer.inventory.id = '71'  OR sitedone_dealer.inventory.id = '72'  OR sitedone_dealer.inventory.id = '73'  OR 
    sitedone_dealer.inventory.id = '74'  OR sitedone_dealer.inventory.id = '75'  OR sitedone_dealer.inventory.id = '76'  OR 
    sitedone_dealer.inventory.id = '77'  OR sitedone_dealer.inventory.id = '78'  OR sitedone_dealer.inventory.id = '79'  OR 
    sitedone_dealer.inventory.id = '80'  OR sitedone_dealer.inventory.id = '81'  OR sitedone_dealer.inventory.id = '82'  OR sitedone_dealer.inventory.id = '83'  OR sitedone_dealer.inventory.id = '84'  OR sitedone_dealer.inventory.id = '85'  OR sitedone_dealer.inventory.id = '86'  OR sitedone_dealer.inventory.id = '87'  OR sitedone_dealer.inventory.id = '88'  OR sitedone_dealer.inventory.id = '89'  OR sitedone_dealer.inventory.id = '90'  OR sitedone_dealer.inventory.id = '91'  OR sitedone_dealer.inventory.id = '92'  OR sitedone_dealer.inventory.id = '94'  OR sitedone_dealer.inventory.id = '95'  OR sitedone_dealer.inventory.id = '96'  OR sitedone_dealer.inventory.id = '97'  OR sitedone_dealer.inventory.id = '98'  OR sitedone_dealer.inventory.id = '99'  OR sitedone_dealer.inventory.id = '100'  OR sitedone_dealer.inventory.id = '102'  OR sitedone_dealer.inventory.id = '103'  OR sitedone_dealer.inventory.id = '104'  OR sitedone_dealer.inventory.id = '105'  OR sitedone_dealer.inventory.id = '106'  OR sitedone_dealer.inventory.id = '107'  OR sitedone_dealer.inventory.id = '108'  OR sitedone_dealer.inventory.id = '109'  OR sitedone_dealer.inventory.id = '110'  OR sitedone_dealer.inventory.id = '111'  OR sitedone_dealer.inventory.id = '164'  OR sitedone_dealer.inventory.id = '167'  OR sitedone_dealer.inventory.id = '198'  OR sitedone_dealer.inventory.id = '199'  OR sitedone_dealer.inventory.id = '200'  OR sitedone_dealer.inventory.id = '202'  OR sitedone_dealer.inventory.id = '203'  OR sitedone_dealer.inventory.id = '204'  OR sitedone_dealer.inventory.id = '205'  OR sitedone_dealer.inventory.id = '206'  OR sitedone_dealer.inventory.id = '207'  OR sitedone_dealer.inventory.id = '208'  OR sitedone_dealer.inventory.id = '209'  OR sitedone_dealer.inventory.id = '210'  OR sitedone_dealer.inventory.id = '211'  OR sitedone_dealer.inventory.id = '212'  OR sitedone_dealer.inventory.id = '213'  OR sitedone_dealer.inventory.id = '214'  OR sitedone_dealer.inventory.id = '215'  OR sitedone_dealer.inventory.id = '216'  OR sitedone_dealer.inventory.id = '217'  OR sitedone_dealer.inventory.id = '218'  )
    Would this portion of the query be better served by something like this?

    Code:
    OR sitedone_dealer.inventory.id IN(1,4,6,87,34,23,676,9877,45,23,....etc)

    I'm open to any suggestion or advice.

    Thanks in advance for any guidance!

  2. #2
    Programming Since 1978 silver trophybronze trophy felgall's Avatar
    Join Date
    Sep 2005
    Location
    Sydney, NSW, Australia
    Posts
    16,862
    Mentioned
    25 Post(s)
    Tagged
    1 Thread(s)
    Using IN for that could be more efficient an certainly couldn't be less efficient. The use of IN lets the database know that all those comparisons are on the one field far more clearly than using all the OR statements does and so the dtabase may be able to come up with a more efficient way of accessing the data.

    It would also make the code easier to read regardless of whether it makes the text more efficient.
    Stephen J Chapman

    javascriptexample.net, Book Reviews, follow me on Twitter
    HTML Help, CSS Help, JavaScript Help, PHP/mySQL Help, blog
    <input name="html5" type="text" required pattern="^$">

  3. #3
    Follow Me On Twitter: @djg gold trophysilver trophybronze trophy Dan Grossman's Avatar
    Join Date
    Aug 2000
    Location
    Philadephia, PA
    Posts
    20,580
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    The query optimizer should end up producing the same query from either one. I would bet the IN() gets rewritten to a bunch of OR's when it's translated to the intermediate query language the individual storage engine gets passed. Some quick testing and checking EXPLAIN estimates seems to verify there's no speed difference.

    So there's no technical difference between the two, do whichever is easier to code or more readable for you.

  4. #4
    Programming Since 1978 silver trophybronze trophy felgall's Avatar
    Join Date
    Sep 2005
    Location
    Sydney, NSW, Australia
    Posts
    16,862
    Mentioned
    25 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by Dan Grossman View Post
    The query optimizer should end up producing the same query from either one.
    I agree. It would be a rather stupid optimiser that didn't. The point I was attempting to make was that if there were to be a difference between the two that the difference would have to give the advantage to the IN version as that is more specific - you are comparing one field to a collection of values rather than just doing a collection of comparisons.

    The IN version should produce shorter code which should therefore be easier to read.

    Of course if you are dynamically generating the query then the OR version may be simpler to code.
    Stephen J Chapman

    javascriptexample.net, Book Reviews, follow me on Twitter
    HTML Help, CSS Help, JavaScript Help, PHP/mySQL Help, blog
    <input name="html5" type="text" required pattern="^$">

  5. #5
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    9,095
    Mentioned
    153 Post(s)
    Tagged
    2 Thread(s)
    I disagree. At least in PHP the IN queries are easier to code.

    Consider
    PHP Code:
    $ids = array(1,2,3);
    $where 'somefield=' implode(' OR somefield='$ids);
    $query 'SELECT something FROM some_table WHERE '.$where
    versus
    PHP Code:
    $ids = array(1,2,3);
    $where 'somefield IN (' implode(','$ids) . ')';
    $query 'SELECT something FROM some_table WHERE '.$where
    The difference is in the $where =, where I find the second one more easy to read.

    Another advantage of IN over OR is that the query is shorter, so you are less likely to run into the maximum query length.
    Although this is slightly far-fetched it's still true
    Rémon - Hosting Advisor

    SitePoint forums will switch to Discourse soon! Make sure you're ready for it!

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  6. #6
    From Italy with love silver trophybronze trophy
    guido2004's Avatar
    Join Date
    Sep 2004
    Posts
    9,506
    Mentioned
    163 Post(s)
    Tagged
    4 Thread(s)
    Another advantage of IN over OR is that it can be added to the WHERE with an AND, so you don't have to worry about adding ( and ) around the conditions

  7. #7
    SitePoint Evangelist
    Join Date
    Sep 2006
    Posts
    428
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Dang, this is my third attempt at responding. The site keeps saying my security token is missing?

    Anyways, the query is already coded and working with the OR statement so if there's no performance gain I won't change it. I guess I need to EXPLAIN the entire query and try to find out where the bottleneck is.

    Many Thanks for the input!

  8. #8
    Programming Since 1978 silver trophybronze trophy felgall's Avatar
    Join Date
    Sep 2005
    Location
    Sydney, NSW, Australia
    Posts
    16,862
    Mentioned
    25 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by ScallioXTX View Post
    I disagree. At least in PHP the IN queries are easier to code.
    That depends on how your code is written to generate the query - I said that using OR may fit better with the way you are generating the code in some instances.

    In this case where the OP already has the code working there is no reason for changing it just for the sake of converting to use IN.
    Stephen J Chapman

    javascriptexample.net, Book Reviews, follow me on Twitter
    HTML Help, CSS Help, JavaScript Help, PHP/mySQL Help, blog
    <input name="html5" type="text" required pattern="^$">

  9. #9
    SitePoint Author silver trophybronze trophy
    wwb_99's Avatar
    Join Date
    May 2003
    Location
    Washington, DC
    Posts
    10,649
    Mentioned
    4 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Dan Grossman View Post
    The query optimizer should end up producing the same query from either one. I would bet the IN() gets rewritten to a bunch of OR's when it's translated to the intermediate query language the individual storage engine gets passed. Some quick testing and checking EXPLAIN estimates seems to verify there's no speed difference.

    So there's no technical difference between the two, do whichever is easier to code or more readable for you.
    Can't speak to MySql but for MSSQL, IN() gets rewritten as a temp table and a join in general, which is hella fast.

  10. #10
    SitePoint Member Loomy's Avatar
    Join Date
    Sep 2004
    Location
    Bergen, Norway
    Posts
    20
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Could it be worth experimenting with using BETWEEN where the id are +1 from the previous? (and keep doing ORs or IN() on those that aren't)

    Instead of doing this:
    Code:
    id = 1 OR id = 2 OR id = 3 OR id = 4 OR id = 5 OR id = 6 OR id = 7 OR id = 8
    do this:
    Code:
    id BETWEEN 1 AND 8
    Might be worth a try, although I suspect the query optimizer will end up doing the exact same query as with ORs and IN().

  11. #11
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,327
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by wwb_99 View Post
    Can't speak to MySql but for MSSQL, IN() gets rewritten as a temp table and a join in general, which is hella fast.
    that's awesome

    but what does MSSQL do for the series of OR conditions?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  12. #12
    SitePoint Evangelist
    Join Date
    Jun 2005
    Posts
    436
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Are those ID's hard coded? If they're coming from another query, you could use a Subquery or Join to speed up to code.

  13. #13
    SitePoint Wizard
    Join Date
    Jul 2003
    Location
    Kent
    Posts
    1,921
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    It seems a bit strange to me to have written a query like that. Where do these IDs come from? If they just happen to be all the existing IDs in the database, then I think you're doing it wrong. If they are from a known subset with a common feature, again you're doing it wrong.

    I too wonder if a join is the correct answer.

  14. #14
    SitePoint Author silver trophybronze trophy
    wwb_99's Avatar
    Join Date
    May 2003
    Location
    Washington, DC
    Posts
    10,649
    Mentioned
    4 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    that's awesome

    but what does MSSQL do for the series of OR conditions?
    Logically, in is the same as a big list (X or Y or Z . . )

  15. #15
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,327
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    so, logically, it creates a temp table in that case too?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  16. #16
    SitePoint Author silver trophybronze trophy
    wwb_99's Avatar
    Join Date
    May 2003
    Location
    Washington, DC
    Posts
    10,649
    Mentioned
    4 Post(s)
    Tagged
    0 Thread(s)
    Can't really say, not horribly familiar with the internals of the MySql query parsing and optimization. Or if MySql even has in-memory temp tables these days.

  17. #17
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,327
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    no, i was asking about msqql, you said mssql creates a temp table for an IN list, and i was asking whether mssql creates a temp table for a series of OR conditions as well

    i realize that this is a mysql thread but the mssql example is relevant in that it would show how different sql can lead to different execution paths...
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  18. #18
    SitePoint Author silver trophybronze trophy
    wwb_99's Avatar
    Join Date
    May 2003
    Location
    Washington, DC
    Posts
    10,649
    Mentioned
    4 Post(s)
    Tagged
    0 Thread(s)
    Gotcha. Answer is that I think it optimizes both clauses the same under the hood at the end of the day.

  19. #19
    SitePoint Enthusiast MickoZ's Avatar
    Join Date
    Jul 2004
    Location
    Canada, Qc
    Posts
    42
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    This would all depend on the internal of MySQL and that could change from version to version too in theory. That is a reason that I sometime write code a way I feel the "right way" (which can be subjective!), unless there is a real need to optimize for a specific version or a real need to optimize over the reading quality for example. The "right way" might get a good chance overtime to be optimized as well if not already.

    I would personally be more interested in the comparison of performance with EXISTS and IN with a subquery on the same table with an equivalent condition (at least I have noticed difference with some DBMS in the past even if it was the same condition). But that is another topic.

    However the "IN" version is also shorter in length. Therefore less data to send to the MySQL server. If it was an heavily used query, it could improve the performance for that part of the process.

    Another thought, you could as well test it yourself with a lot of your OR condition. Make sure when you run your query to use SQL_NO_CACHE, else they might both run at the same speed because the result has been cached.

    You can also analyze the query with DESCRIBE.

    I personally tried a simple version on my side and both the "multiple OR" and the "IN" versions had the same execution plan with a "range" search type. So there is a good chance they will be optimized the same way by the MySQL server.

    You might let us know what you find. ;-)

  20. #20
    SitePoint Evangelist
    Join Date
    Sep 2006
    Posts
    428
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    A JOIN may very well be the most proper solution. I'll look into this further and let you guys know the result.

    The OR is composed of a SELECT all from a table that holds only the id's used in the OR statement. The id's in this table relate to a separate database full of products. Each dealer only has access to a certain number of products in the main product database.

    I'll have another look at the query and see if a JOIN is the more proper solution.

  21. #21
    SitePoint Evangelist
    Join Date
    Sep 2006
    Posts
    428
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    WOW. Some pretty impressive gains. Re-looking at how my query was designed was definitely the proper action. A little background, this query is actually just 1/3 of a larger more complex query. I modified this to utilize a JOIN rather than a SELECT (to get the list of id's) and an OR to find the id's in another table. Here are the numbers (WOW!).

    *note: I used a 'worst case' scenario for testing so the performance gains will be less in most of my production environments. My tests utilized a set of approx 800 OR's which would be a dealer with ALL products listed on their site (which would be a very rare occurrence). Also these queries were run with no cache.

    Query with multiple OR's: 0.3606sec
    Same Query with JOIN: 0.0232sec

    That's a savings of .3374 secs per query!! Not to mention server resource savings.

    Thanks to everyone (especially Dr John) for helping me work through this and optimize this query!!

  22. #22
    SitePoint Guru
    Join Date
    Jun 2006
    Posts
    638
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    If you use IN, and the field for the IN is your primary key, your select will take ~0.0003 sec.
    After your joins, if done on keys, you should get your result in ~0.003 sec

    Do a DESCRIBE on your query, and change it to get less joins in there, and I'm sure you can make it at least 10 times faster.

    Example:
    Code:
    # table has indexes on type1 and type2
    
    # Slower
    SELECT id FROM table WHERE type1 = 1 OR type2 IN (1,2,3,4);
    
    # Faster
    SELECT id FROM table WHERE type1 = 1
    UNION
    SELECT id FROM table WHERE type2 IN (1,2,3,4);
    Reason:
    When you use the OR, MySQL will only use one of the two indexes. So when your using UNION, it does the SELECT twice, both times on indexes.

    Play around with DESCRIBE to optimize queries.

    Ps:
    - Sometimes, the queries that slow down your server are not the slow ones, but the "fast enough" ones that get run 100 times per page... I would look for some of those to optimize.

  23. #23
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,327
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by Vali View Post
    Example:
    Code:
    # table has indexes on type1 and type2
    
    # Slower
    SELECT id FROM table WHERE type1 = 1 OR type2 IN (1,2,3,4);
    
    # Faster
    SELECT id FROM table WHERE type1 = 1
    UNION
    SELECT id FROM table WHERE type2 IN (1,2,3,4);
    Reason:
    When you use the OR, MySQL will only use one of the two indexes. So when your using UNION, it does the SELECT twice, both times on indexes.
    this is a fabulous example, may i steal it?



    by the way, not all databases work this way, some can do index merges because they can actually utilize more than one index (shocking, eh)

    sadly, i'm finding it harder and harder to remember all these little nuances for multiple database systems...

    so my strategy is to write SQL that is easy to write and more importantly easy to understand

    thus, i would write the OR syntax rather than the UNION syntax, if i even remembered about the UNION syntax

    (and like SQL, i don't use hacks in CSS, either)

    but you are right, little nuances like this are extremely important for "hardening" an application that will see heavy use
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  24. #24
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    9,095
    Mentioned
    153 Post(s)
    Tagged
    2 Thread(s)
    To continue the discussion from a few posts back, I've been thinking about it quite a lot the last few days, and I think that "under the hood" DBMSs rewrite IN to a series of OR's.

    At a low level, the DBMS has to check whether the specified field has some value, and since it impossible to check if the value of a field is in a given collection of values at once, the DBMS would need to check it for each of the possibilities separately.

    For those of you who might argue the DBMS could use something like PHP's in_array() to check if the specified value has one of the values in a given collection I have to disappoint you, "under the hood" PHP is also checking the values one by one.

    Since DBMSs rewrite the query once and then execute it, it would be weird if they left the IN in the query, since it will have to be executed as a series of OR's.
    So rewriting IN to series of OR's makes sense. Well, at least to me it does
    Rémon - Hosting Advisor

    SitePoint forums will switch to Discourse soon! Make sure you're ready for it!

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  25. #25
    Follow Me On Twitter: @djg gold trophysilver trophybronze trophy Dan Grossman's Avatar
    Join Date
    Aug 2000
    Location
    Philadephia, PA
    Posts
    20,580
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    MySQL doesn't actually execute any query in any manner.

    The MySQL server has a query optimizer and planner which rewrites the query to a simplified query language, and that query is passed on to the storage engine used by the table.

    The storage engine is what actually implements all the information retrieval and modification operations. To talk about the retrieval algorithm at such a low level you have to dig into the storage engine. MyISAM does it totally different than InnoDB, and there are at least 20 engines you can choose from. Each can implement the lower level query operations differently.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •