SitePoint Sponsor

User Tag List

Results 1 to 22 of 22
  1. #1
    SitePoint Addict
    Join Date
    Jul 2007
    Posts
    233
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Help: "not equals !=" is very slow

    Greetings,

    I have a script where I would like to show 5 random items from a particular seller, and 5 random items from other sellers:

    PHP Code:
    $sql "SELECT * FROM items WHERE seller = 'johndoe' LIMIT 5";
    $sql "SELECT * FROM items WHERE seller != 'johndoe' LIMIT 5"
    The 'seller' column is indexed and I have over 300,000 items.
    The first query runs through super quick (0.01 seconds).
    The second query takes 10 seconds.

    I see that the Not Equals or "!=" does not work on an index. Is there a way to speed this up somehow?

    Thanks
    Kind regards

  2. #2
    SitePoint Mentor silver trophy
    Rubble's Avatar
    Join Date
    Dec 2005
    Location
    Cambridge, England
    Posts
    2,398
    Mentioned
    81 Post(s)
    Tagged
    3 Thread(s)
    I think you have a logic fault as your code is not going to show 5 random items as it will stop after the first 5 items it finds.

    Do you just have the one table with 300,00 items and all the details in the one table?

  3. #3
    Just Blow It bronze trophy
    DaveMaxwell's Avatar
    Join Date
    Nov 1999
    Location
    Mechanicsburg, PA
    Posts
    7,264
    Mentioned
    115 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by peppy View Post
    Greetings,

    I have a script where I would like to show 5 random items from a particular seller, and 5 random items from other sellers:

    PHP Code:
    $sql "SELECT * FROM items WHERE seller = 'johndoe' LIMIT 5";
    $sql "SELECT * FROM items WHERE seller != 'johndoe' LIMIT 5"
    The 'seller' column is indexed and I have over 300,000 items.
    The first query runs through super quick (0.01 seconds).
    The second query takes 10 seconds.

    I see that the Not Equals or "!=" does not work on an index. Is there a way to speed this up somehow?

    Thanks
    Kind regards
    The reason for the performance difference is the quantity of result sets in the two queries - if there are 500 johndoes there, the first query will process those 500 records to get the first five, and the second will be working with 299,500 records.

    I don't think your 2nd query will do exactly what you want - without any sort of sort order in it, it should return the same five records each time.
    Dave Maxwell - Manage Your Site Team Leader
    My favorite YouTube Video! | Star Wars, Dr Suess Style
    Learn how to be ready for The Forums' Move to Discourse

  4. #4
    SitePoint Addict
    Join Date
    Jul 2007
    Posts
    233
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The actual queries are much larger and complex, which includes the randomization part. The only difference between the two queries is the "!=" part on the seller column, which is the difference between 0.01 seconds and 10 seconds.

    I just provided the smallest example above for simplicity.

  5. #5
    SitePoint Addict bronze trophy mawburn's Avatar
    Join Date
    Apr 2014
    Posts
    205
    Mentioned
    6 Post(s)
    Tagged
    0 Thread(s)
    As Rubble said, this won't find random items. Finding random items in SQL is very hard, because you first have to pull all the rows in, assign each row a unique number, then pick out randomly numbered rows, and check . It's much more than just a simple select statement because this is going way out of bounds of what SQL is really designed for. Even if you have a numeric ID column, it's still not going to be easy to do in SQL.

    You're probably better off returning all results to a php array, then randomly pulling the array elements by number.

    As far as efficiency of the query, you could try NOT EXISTS:
    Code:
    $sql = "SELECT DISTINCT * FROM items WHERE seller = 'johndoe'"; 
    $sql = "SELECT DISTINCT * FROM items WHERE NOT EXISTS(" . $sql . ");";

  6. #6
    SitePoint Addict
    Join Date
    Jul 2007
    Posts
    233
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Is there a fast way to join the table to itself where the seller is not equal to "johndoe" and provide a new table without "johndoe" within it? or will that be just as slow?

  7. #7
    SitePoint Addict
    Join Date
    Jul 2007
    Posts
    233
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    What if I use the "NOT EXISTS", how do I add conditions to it? for example:

    PHP Code:
    $sql "SELECT * FROM items WHERE NOT EXISTS (SELECT * FROM items WHERE seller = 'johndoe') AND category = '21' 
    In words, I would like to select items from other sellers, but they must be contained within specific categories.

  8. #8
    SitePoint Addict bronze trophy mawburn's Avatar
    Join Date
    Apr 2014
    Posts
    205
    Mentioned
    6 Post(s)
    Tagged
    0 Thread(s)
    That is syntactically correct.

  9. #9
    SitePoint Addict
    Join Date
    Jul 2007
    Posts
    233
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Whenever I do the query, I end up with 0 rows returned. There are multiple sellers with products within this category though.

    EDIT:
    I used the "EXPLAIN" on this query and got this:

    id select_type table type possible_keys key key_len ref rows Extra
    1 PRIMARY NULL NULL NULL NULL NULL NULL NULL Impossible WHERE
    2 SUBQUERY items ref seller seller 92 180206 Using where; Using index

    This says the Primary query has: "Impossible WHERE" . I'm not sure why...

  10. #10
    SitePoint Addict bronze trophy mawburn's Avatar
    Join Date
    Apr 2014
    Posts
    205
    Mentioned
    6 Post(s)
    Tagged
    0 Thread(s)
    Whoops. My fault. That wasn't correct. I'll admit that I have only used NOT EXISTS a few times.

    http://sqlfiddle.com/#!2/6d449/1

    There's 3 options for you above. NOT IN, LEFT JOIN, and NOT EXISTS.

    Code:
    CREATE TABLE fish (
        id INT(11) NOT NULL AUTO_INCREMENT,
        name VARCHAR(20) NOT NULL,
        length INT(11) NOT NULL,
        weight INT(11) NOT NULL,
    
        PRIMARY KEY pk (id)
    ) ENGINE=InnoDb;
    
    INSERT INTO fish VALUES (1, 'catfish', 7, 22);
    INSERT INTO fish VALUES (2, 'pike', 5, 2);
    INSERT INTO fish VALUES (3, 'bass', 8, 22);
    INSERT INTO fish VALUES (4, 'tuna', 24, 200);
    INSERT INTO fish VALUES (5, 'shark', 60, 956);
    INSERT INTO fish VALUES (6, 'catfish', 9, 22);
    Code:
    SELECT f.* FROM fish f
    WHERE id NOT IN (
                      SELECT id FROM fish f2
                      WHERE f2.name='catfish'
                     )
          AND length > 8;
    
    
    	
    SELECT f.* FROM fish f 
    LEFT JOIN fish f2 ON f2.id = f.id AND f2.name = 'catfish'
    WHERE f2.id IS NULL
          AND f.length > 8;
    
    
    
    SELECT f.* FROM fish f
    WHERE NOT EXISTS (
                      SELECT null FROM fish f2 
                      WHERE f2.id = f.id AND f2.name = 'catfish'
                     )
          AND f.length > 8;

  11. #11
    SitePoint Addict
    Join Date
    Jul 2007
    Posts
    233
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks for the providing the options.

    Unfortunately, I've tried all three methods and it's still taking just as long to perform the query. Even if I strip out any random generators and use LIMIT, it still takes just as long. You would think MySQL would have something for dealing with this.

  12. #12
    SitePoint Addict
    Join Date
    Jul 2007
    Posts
    233
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Modified Question:

    I modified the slow equation that takes 10 seconds and needs improvement:

    PHP Code:
    $sql "SELECT * FROM items WHERE seller != 'johndoe' AND category = '21' LIMIT 100"
    There has to be a faster query: Select any 100 items listed in category '21' from any seller except "johndoe"... The table contains 300,000 items total.

  13. #13
    Programming Team silver trophybronze trophy
    Mittineague's Avatar
    Join Date
    Jul 2005
    Location
    West Springfield, Massachusetts
    Posts
    17,161
    Mentioned
    190 Post(s)
    Tagged
    2 Thread(s)
    I may be wrong, but I have a feeling that the problems you are having may be a result of less than optimal database design.

    I'm far from a "normalization" expert, and because I had little experience crafting complex queries, I set up complex tables.
    They were OK as long as the amount of data was minimal, but they didn't scale well.

    Short of breaking your table into smaller cross-referenced relational tables, you could try explicity specifying the fields you're interested in getting instead of using the * wildcard. That might help some.

  14. #14
    SitePoint Addict
    Join Date
    Jul 2007
    Posts
    233
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Unfortunately, this doesn't make too much of a difference on my end. I've simply selected the "id" column and this takes 10 seconds or so.

  15. #15
    Programming Team silver trophybronze trophy
    Mittineague's Avatar
    Join Date
    Jul 2005
    Location
    West Springfield, Massachusetts
    Posts
    17,161
    Mentioned
    190 Post(s)
    Tagged
    2 Thread(s)
    Can you use seller_id = and seller_id != (presumed to be an Int) instead of searching for the text (presumed to be Varchar) ?
    AFAIK text queries are more resource intensive than int queries.

  16. #16
    SitePoint Mentor silver trophy
    Rubble's Avatar
    Join Date
    Dec 2005
    Location
    Cambridge, England
    Posts
    2,398
    Mentioned
    81 Post(s)
    Tagged
    3 Thread(s)
    You could run the search code say every night at midnight and generate an array just with the row ID - this would negate any time taken - the array will be written to a file on the server.
    When the user visits the page 5 random values are taken from the two arrays, the data is fetched from the database and displayed.

  17. #17
    SitePoint Mentor bronze trophy
    John_Betong's Avatar
    Join Date
    Aug 2005
    Location
    City of Angels
    Posts
    1,833
    Mentioned
    73 Post(s)
    Tagged
    6 Thread(s)
    Quote Originally Posted by peppy View Post
    Greetings,

    I have a script where I would like to show 5 random items from a particular seller, and 5 random items from other sellers:

    PHP Code:
    $sql "SELECT * FROM items WHERE seller = 'johndoe' LIMIT 5";
    $sql "SELECT * FROM items WHERE seller != 'johndoe' LIMIT 5"
    The 'seller' column is indexed and I have over 300,000 items.
    The first query runs through super quick (0.01 seconds).
    The second query takes 10 seconds.

    I see that the Not Equals or "!=" does not work on an index. Is there a way to speed this up somehow?

    Thanks
    Kind regards
    Try this:

    PHP Code:

    "SELECT * FROM items WHERE seller  > 'johndoe'  LIMIT 5"
    Learn how to be ready for The New Move to Discourse

    How to make Make Money Now with a *NEW* look

    Be sure to congratulate Patche on earning Member of the Month for July 2014

  18. #18
    SitePoint Addict bronze trophy mawburn's Avatar
    Join Date
    Apr 2014
    Posts
    205
    Mentioned
    6 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Mittineague View Post
    I may be wrong, but I have a feeling that the problems you are having may be a result of less than optimal database design.
    Well, remember he's also doing some randomization in his SQL statement and isn't giving us his fully query. I think there is something else going on. Like I said earlier, doing the random selection in SQL is probably the least preferred method.

  19. #19
    SitePoint Addict
    Join Date
    Jul 2007
    Posts
    233
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by mawburn View Post
    Well, remember he's also doing some randomization in his SQL statement and isn't giving us his fully query. I think there is something else going on. Like I said earlier, doing the random selection in SQL is probably the least preferred method.
    I modified the question a bit to remove any randomization. All I want is to get this simple function to work in a fast manner:

    PHP Code:
    $sql "SELECT * FROM items WHERE seller != 'johndoe' AND category = '21' LIMIT 100"

    Quote Originally Posted by John_Betong View Post
    Try this:

    PHP Code:

    "SELECT * FROM items WHERE seller  > 'johndoe'  LIMIT 5"
    I tried this and unfortunately there is no difference. I also tried this and it doesn't make a differnce either:
    PHP Code:
    $sql "SELECT * FROM items WHERE seller <> 'johndoe' AND category = '21' LIMIT 100"

    Quote Originally Posted by Rubble View Post
    You could run the search code say every night at midnight and generate an array just with the row ID - this would negate any time taken - the array will be written to a file on the server.
    When the user visits the page 5 random values are taken from the two arrays, the data is fetched from the database and displayed.
    This could work, although our site has over 10,000 categories and subcategories with many items within them. I'm guessing this would end up being impractical.

    There must be another way, I know I'm not the only one on the internet who shows a list of "related items from other sellers" on their sites.

  20. #20
    SitePoint Wizard TheRedDevil's Avatar
    Join Date
    Sep 2004
    Location
    Norway
    Posts
    1,196
    Mentioned
    4 Post(s)
    Tagged
    0 Thread(s)
    You most probably have a database model problem and possible index problems as well.

    First off, are you using composite keys? If, why is it "seller, category"?

    If you add a composite key that is: "category, seller" and then update your query to equal this:
    PHP Code:
    $sql "SELECT * FROM items WHERE category = '21' AND seller<>'johndoe' LIMIT 100"
    I am certain you will get a massive speed increase, since now we first limit the query to any records in category 21 that is not made by seller johndoe, instead of limiting the query to any records not made by seller johndoe that is also in category 21.

    However, the problem you will see when this works is that you will always pull the same 100 records (unless some is deleted) i.e. ASC order. In addition the RAND() function in SQL should be avoided like the plague since it quite honestly is a "DB killer" on larger tables.

    Instead what you can do is this:

    PHP Code:
    $range_from mt_rand(1,$total)
    $sql "SELECT * FROM items WHERE category = '21' AND seller<>'johndoe' LIMIT $range_from, 100"
    However, please note that LIMIT X,Y is also very slow. The reason for this is due to if you have a table with a million records and write: LIMIT 997000, 100 the SQL engine actually read all 997000 records to get to the ones you want to pull. This means if you want to speed this up more than the suggested query above, you need to alter your database models with this in mind, allowing you to base queries on primary keys, and being certain that you can actually use it internally in the software after.

  21. #21
    SitePoint Addict
    Join Date
    Jul 2007
    Posts
    233
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by TheRedDevil View Post
    If you add a composite key that is: "category, seller" and then update your query to equal this:
    PHP Code:
    $sql "SELECT * FROM items WHERE category = '21' AND seller<>'johndoe' LIMIT 100"
    THANKS A LOT! I made my indexes composites like you suggested and everything appears to work like a well-oiled machine so far, just phenomenal!

    I will keep an eye on this over the next few days just to confirm everything really is working this great.


    Quote Originally Posted by TheRedDevil View Post
    However, please note that LIMIT X,Y is also very slow. The reason for this is due to if you have a table with a million records and write: LIMIT 997000, 100 the SQL engine actually read all 997000 records to get to the ones you want to pull. This means if you want to speed this up more than the suggested query above, you need to alter your database models with this in mind, allowing you to base queries on primary keys, and being certain that you can actually use it internally in the software after
    Just as a side question, how can this slow behavior be improved? I don't use this on my random items script, but I use this type of LIMIT x,y very often with my Pagination, and the load speed does slow down quite a bit when clicking the last page numbers.

    Thanks
    Kind regards

  22. #22
    SitePoint Wizard TheRedDevil's Avatar
    Join Date
    Sep 2004
    Location
    Norway
    Posts
    1,196
    Mentioned
    4 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by peppy View Post
    THANKS A LOT! I made my indexes composites like you suggested and everything appears to work like a well-oiled machine so far, just phenomenal!

    I will keep an eye on this over the next few days just to confirm everything really is working this great.
    No need to watch it, as it should work. You want to come back when the rows in the table has increased drastically though. There is no rules to this, but I keep seeing that reviewing all the queries as the table sizes reach 100k, million, 5 million etc. seems to be a good idea. (and of course if a customer report a slow page)

    You could also base this on the "slow query" report functionality in MySQL, but by manually checking you normally find issues with queries that dont show up on this report as well, i.e. you are able to tune them a little to increase speed.

    When you write queries, try to think on how you can write it so it require less "search", i.e. start with the column that has the least variables/results and go on. This together with good indexes normally will get you good query speed.

    NOTE: When I say "least variables/results" I consider columns that contain a real value. Avoid starting with a value that contain a few options, like "enabled" etc. columns, since while they have few variables, they normally have a lot of records of each type so your not able to limit the search properly.

    Quote Originally Posted by peppy View Post
    Just as a side question, how can this slow behavior be improved? I don't use this on my random items script, but I use this type of LIMIT x,y very often with my Pagination, and the load speed does slow down quite a bit when clicking the last page numbers.
    For pagination, it is actually quite easy to solve but it will require some code changes on your side.

    What you will need is that for each "page" link, you actually need to attach the page id instead of "page reference".

    Then in the queries you do like this:
    PHP Code:
     "SELECT * FROM table WHERE table_id=:table_id" 
    Of course this is simplified, but you get the idea.

    You should also read this article, it is a good explanation on the problem, and also have suggestions on how to fix it.
    http://mysql.rjweb.org/doc.php/pagination


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •