SitePoint Sponsor

User Tag List

Results 1 to 22 of 22

Hybrid View

  1. #1
    SitePoint Addict
    Join Date
    Jul 2007
    Posts
    233
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Help: "not equals !=" is very slow

    Greetings,

    I have a script where I would like to show 5 random items from a particular seller, and 5 random items from other sellers:

    PHP Code:
    $sql "SELECT * FROM items WHERE seller = 'johndoe' LIMIT 5";
    $sql "SELECT * FROM items WHERE seller != 'johndoe' LIMIT 5"
    The 'seller' column is indexed and I have over 300,000 items.
    The first query runs through super quick (0.01 seconds).
    The second query takes 10 seconds.

    I see that the Not Equals or "!=" does not work on an index. Is there a way to speed this up somehow?

    Thanks
    Kind regards

  2. #2
    SitePoint Mentor silver trophy
    Rubble's Avatar
    Join Date
    Dec 2005
    Location
    Cambridge, England
    Posts
    2,408
    Mentioned
    81 Post(s)
    Tagged
    3 Thread(s)
    I think you have a logic fault as your code is not going to show 5 random items as it will stop after the first 5 items it finds.

    Do you just have the one table with 300,00 items and all the details in the one table?

  3. #3
    SitePoint Addict
    Join Date
    Jul 2007
    Posts
    233
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The actual queries are much larger and complex, which includes the randomization part. The only difference between the two queries is the "!=" part on the seller column, which is the difference between 0.01 seconds and 10 seconds.

    I just provided the smallest example above for simplicity.

  4. #4
    Just Blow It bronze trophy
    DaveMaxwell's Avatar
    Join Date
    Nov 1999
    Location
    Mechanicsburg, PA
    Posts
    7,265
    Mentioned
    115 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by peppy View Post
    Greetings,

    I have a script where I would like to show 5 random items from a particular seller, and 5 random items from other sellers:

    PHP Code:
    $sql "SELECT * FROM items WHERE seller = 'johndoe' LIMIT 5";
    $sql "SELECT * FROM items WHERE seller != 'johndoe' LIMIT 5"
    The 'seller' column is indexed and I have over 300,000 items.
    The first query runs through super quick (0.01 seconds).
    The second query takes 10 seconds.

    I see that the Not Equals or "!=" does not work on an index. Is there a way to speed this up somehow?

    Thanks
    Kind regards
    The reason for the performance difference is the quantity of result sets in the two queries - if there are 500 johndoes there, the first query will process those 500 records to get the first five, and the second will be working with 299,500 records.

    I don't think your 2nd query will do exactly what you want - without any sort of sort order in it, it should return the same five records each time.
    Dave Maxwell - Manage Your Site Team Leader
    My favorite YouTube Video! | Star Wars, Dr Suess Style
    Learn how to be ready for The Forums' Move to Discourse

  5. #5
    SitePoint Addict bronze trophy mawburn's Avatar
    Join Date
    Apr 2014
    Posts
    213
    Mentioned
    6 Post(s)
    Tagged
    0 Thread(s)
    As Rubble said, this won't find random items. Finding random items in SQL is very hard, because you first have to pull all the rows in, assign each row a unique number, then pick out randomly numbered rows, and check . It's much more than just a simple select statement because this is going way out of bounds of what SQL is really designed for. Even if you have a numeric ID column, it's still not going to be easy to do in SQL.

    You're probably better off returning all results to a php array, then randomly pulling the array elements by number.

    As far as efficiency of the query, you could try NOT EXISTS:
    Code:
    $sql = "SELECT DISTINCT * FROM items WHERE seller = 'johndoe'"; 
    $sql = "SELECT DISTINCT * FROM items WHERE NOT EXISTS(" . $sql . ");";

  6. #6
    SitePoint Addict
    Join Date
    Jul 2007
    Posts
    233
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Is there a fast way to join the table to itself where the seller is not equal to "johndoe" and provide a new table without "johndoe" within it? or will that be just as slow?

  7. #7
    SitePoint Addict
    Join Date
    Jul 2007
    Posts
    233
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    What if I use the "NOT EXISTS", how do I add conditions to it? for example:

    PHP Code:
    $sql "SELECT * FROM items WHERE NOT EXISTS (SELECT * FROM items WHERE seller = 'johndoe') AND category = '21' 
    In words, I would like to select items from other sellers, but they must be contained within specific categories.

  8. #8
    SitePoint Addict bronze trophy mawburn's Avatar
    Join Date
    Apr 2014
    Posts
    213
    Mentioned
    6 Post(s)
    Tagged
    0 Thread(s)
    That is syntactically correct.

  9. #9
    SitePoint Addict
    Join Date
    Jul 2007
    Posts
    233
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Whenever I do the query, I end up with 0 rows returned. There are multiple sellers with products within this category though.

    EDIT:
    I used the "EXPLAIN" on this query and got this:

    id select_type table type possible_keys key key_len ref rows Extra
    1 PRIMARY NULL NULL NULL NULL NULL NULL NULL Impossible WHERE
    2 SUBQUERY items ref seller seller 92 180206 Using where; Using index

    This says the Primary query has: "Impossible WHERE" . I'm not sure why...

  10. #10
    SitePoint Addict bronze trophy mawburn's Avatar
    Join Date
    Apr 2014
    Posts
    213
    Mentioned
    6 Post(s)
    Tagged
    0 Thread(s)
    Whoops. My fault. That wasn't correct. I'll admit that I have only used NOT EXISTS a few times.

    http://sqlfiddle.com/#!2/6d449/1

    There's 3 options for you above. NOT IN, LEFT JOIN, and NOT EXISTS.

    Code:
    CREATE TABLE fish (
        id INT(11) NOT NULL AUTO_INCREMENT,
        name VARCHAR(20) NOT NULL,
        length INT(11) NOT NULL,
        weight INT(11) NOT NULL,
    
        PRIMARY KEY pk (id)
    ) ENGINE=InnoDb;
    
    INSERT INTO fish VALUES (1, 'catfish', 7, 22);
    INSERT INTO fish VALUES (2, 'pike', 5, 2);
    INSERT INTO fish VALUES (3, 'bass', 8, 22);
    INSERT INTO fish VALUES (4, 'tuna', 24, 200);
    INSERT INTO fish VALUES (5, 'shark', 60, 956);
    INSERT INTO fish VALUES (6, 'catfish', 9, 22);
    Code:
    SELECT f.* FROM fish f
    WHERE id NOT IN (
                      SELECT id FROM fish f2
                      WHERE f2.name='catfish'
                     )
          AND length > 8;
    
    
    	
    SELECT f.* FROM fish f 
    LEFT JOIN fish f2 ON f2.id = f.id AND f2.name = 'catfish'
    WHERE f2.id IS NULL
          AND f.length > 8;
    
    
    
    SELECT f.* FROM fish f
    WHERE NOT EXISTS (
                      SELECT null FROM fish f2 
                      WHERE f2.id = f.id AND f2.name = 'catfish'
                     )
          AND f.length > 8;

  11. #11
    SitePoint Addict
    Join Date
    Jul 2007
    Posts
    233
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks for the providing the options.

    Unfortunately, I've tried all three methods and it's still taking just as long to perform the query. Even if I strip out any random generators and use LIMIT, it still takes just as long. You would think MySQL would have something for dealing with this.

  12. #12
    SitePoint Addict
    Join Date
    Jul 2007
    Posts
    233
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Modified Question:

    I modified the slow equation that takes 10 seconds and needs improvement:

    PHP Code:
    $sql "SELECT * FROM items WHERE seller != 'johndoe' AND category = '21' LIMIT 100"
    There has to be a faster query: Select any 100 items listed in category '21' from any seller except "johndoe"... The table contains 300,000 items total.

  13. #13
    Programming Team silver trophybronze trophy
    Mittineague's Avatar
    Join Date
    Jul 2005
    Location
    West Springfield, Massachusetts
    Posts
    17,175
    Mentioned
    190 Post(s)
    Tagged
    2 Thread(s)
    I may be wrong, but I have a feeling that the problems you are having may be a result of less than optimal database design.

    I'm far from a "normalization" expert, and because I had little experience crafting complex queries, I set up complex tables.
    They were OK as long as the amount of data was minimal, but they didn't scale well.

    Short of breaking your table into smaller cross-referenced relational tables, you could try explicity specifying the fields you're interested in getting instead of using the * wildcard. That might help some.

  14. #14
    SitePoint Addict
    Join Date
    Jul 2007
    Posts
    233
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Unfortunately, this doesn't make too much of a difference on my end. I've simply selected the "id" column and this takes 10 seconds or so.

  15. #15
    SitePoint Addict bronze trophy mawburn's Avatar
    Join Date
    Apr 2014
    Posts
    213
    Mentioned
    6 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Mittineague View Post
    I may be wrong, but I have a feeling that the problems you are having may be a result of less than optimal database design.
    Well, remember he's also doing some randomization in his SQL statement and isn't giving us his fully query. I think there is something else going on. Like I said earlier, doing the random selection in SQL is probably the least preferred method.

  16. #16
    SitePoint Mentor silver trophy
    Rubble's Avatar
    Join Date
    Dec 2005
    Location
    Cambridge, England
    Posts
    2,408
    Mentioned
    81 Post(s)
    Tagged
    3 Thread(s)
    You could run the search code say every night at midnight and generate an array just with the row ID - this would negate any time taken - the array will be written to a file on the server.
    When the user visits the page 5 random values are taken from the two arrays, the data is fetched from the database and displayed.

  17. #17
    SitePoint Mentor bronze trophy
    John_Betong's Avatar
    Join Date
    Aug 2005
    Location
    City of Angels
    Posts
    1,836
    Mentioned
    73 Post(s)
    Tagged
    6 Thread(s)
    Quote Originally Posted by peppy View Post
    Greetings,

    I have a script where I would like to show 5 random items from a particular seller, and 5 random items from other sellers:

    PHP Code:
    $sql "SELECT * FROM items WHERE seller = 'johndoe' LIMIT 5";
    $sql "SELECT * FROM items WHERE seller != 'johndoe' LIMIT 5"
    The 'seller' column is indexed and I have over 300,000 items.
    The first query runs through super quick (0.01 seconds).
    The second query takes 10 seconds.

    I see that the Not Equals or "!=" does not work on an index. Is there a way to speed this up somehow?

    Thanks
    Kind regards
    Try this:

    PHP Code:

    "SELECT * FROM items WHERE seller  > 'johndoe'  LIMIT 5"
    Learn how to be ready for The New Move to Discourse

    How to make Make Money Now with a *NEW* look

    Be sure to congratulate Patche on earning Member of the Month for July 2014


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •