SitePoint Sponsor

User Tag List

Results 1 to 16 of 16

Hybrid View

  1. #1
    SitePoint Member
    Join Date
    Jun 2013
    Posts
    15
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    What is the best performance search query/method for huge number of records

    Hello,

    I need to know what's the best way to search tables with million records. I have a table called <people> and this table has the column: <status>

    Status might be more than one word like:

    "I am happy today and the weather is nice".

    Also, it might be in different languages.

    Records in table are represented by UTF-8.

    I need to search for a sub-word, word, or even some words out of the whole sentence like for example (according the status above):

    Search 1: keyword = "ppy"

    Search 2: keyword = "am weather"

    Search 3: keyword = "nice"

    Search 4: keyword = "day weath"

    I would greatly appreciate if you hint me to the best method to apply a robust search. As far as I know using "LIKE" is not practical for huge records. I heard about the full text index but never used that.

    Thank you very much!

    Cheers,

  2. #2
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,330
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by moderns View Post
    I heard about the full text index but never used that.
    you should try it
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  3. #3
    SitePoint Member
    Join Date
    Jun 2013
    Posts
    15
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    you should try it
    I will go for the full text index search. I would appreciate if you clear the right equivalent statement in full text search for the below:

    $sql = ' SELECT people.status, people.id FROM people WHERE
    people.status LIKE "%'.$keySearch.'%" AND people.id != "'.$this->myId.'" ORDER BY people.time ASC LIMIT '.$limit;

    Also, what shall I do to the table in mysql? I never worked on this so your detailed answer is much appreciated.

    Thanks a lot.

  4. #4
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,330
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    first thing you have to do is create a FULLTEXT index on the column(s) you want to search

    then compose the query using MATCH syntax, which you will find explained in the manual
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  5. #5
    SitePoint Member
    Join Date
    Jun 2013
    Posts
    15
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    first thing you have to do is create a FULLTEXT index on the column(s) you want to search

    then compose the query using MATCH syntax, which you will find explained in the manual
    Thank you! But shall I use the Boolean search? Could you please hint me about the correct statement?

    One more thing, will it search for partial segments of words?

  6. #6
    SitePoint Member
    Join Date
    Jun 2013
    Posts
    15
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I have changed the column "status" to full text index and the engine is MyISAM. I have tried the below sql but it is not working I am getting the error:
    "test" AND people.id != "'.$this->myId.'" ORDER BY people.time ASC LIMIT '.$limit;

    You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near "test"

    $keySearch="test";
    $limit=10;
    $this->myId=1;

    $sql = ' SELECT people.status, people.id FROM people WHERE MATCH (people.status) AGAINST "'.$keySearch.'" AND people.id != "'.$this->myId.'" ORDER BY people.time ASC LIMIT '.$limit;

    mysql_query($sql);

    echo mysql_error();


    Would you please advise?

    Thanks.

  7. #7
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,330
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    please test it in mysql first, without php
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  8. #8
    SitePoint Member
    Join Date
    Jun 2013
    Posts
    15
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    please test it in mysql first, without php
    I have something strange!

    In my table people, I have the status for each person as below:

    id status
    -- ------
    1 hello
    2 hello
    3 hello
    4 hello
    5 hello

    When I search using the SQL below:

    SELECT people.status FROM people WHERE

    MATCH (people.status) AGAINST ("hello*" IN BOOLEAN MODE)

    HAVING people.id!= 1

    ORDER BY status_time ASC LIMIT 10


    I get ZERO rows! I should get 4 rows!

    BUT:

    If the table people like below:

    id status
    -- ------
    1 sello
    2 sello
    3 sello
    4 sello
    5 sello

    And if I change the keyword for search to "sello", then I get 4 rows! Why!! Below is the SQL:

    SELECT people.status FROM people WHERE

    MATCH (people.status) AGAINST ("sello*" IN BOOLEAN MODE)

    HAVING people.id!= 1

    ORDER BY status_time ASC LIMIT 10


    Is there any restriction on the "hello" in mysql?

    Thanks.

  9. #9
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,330
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by da manual
    Such a technique works best with large collections (in fact, it was carefully tuned this way). For very small tables, word distribution does not adequately reflect their semantic value, and this model may sometimes produce bizarre results. For example, although the word “MySQL” is present in every row of the articles table shown earlier, a search for the word produces no results:
    Code:
    mysql> SELECT * FROM articles
        -> WHERE MATCH (title,body) AGAINST ('MySQL');
    Empty set (0.00 sec)
    The search result is empty because the word “MySQL” is present in at least 50% of the rows. As such, it is effectively treated as a stopword.

    -- http://dev.mysql.com/doc/refman/5.0/...-language.html
    emphasis added
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  10. #10
    SitePoint Member
    Join Date
    Jun 2013
    Posts
    15
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    emphasis added
    Thanks but what's the difference between "hello" and "sello"? Why I am getting 4 hits for "sello" while getting ZERO hits for "hello"?

  11. #11
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,330
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by moderns View Post
    Thanks but what's the difference between "hello" and "sello"? Why I am getting 4 hits for "sello" while getting ZERO hits for "hello"?
    because there are so few rows in the table

    bizarre, eh?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  12. #12
    SitePoint Member
    Join Date
    Jun 2013
    Posts
    15
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    because there are so few rows in the table

    bizarre, eh?
    But "sello" is same case. Please look at the two tables. When searching for "sello", I am getting 4 hits, while when searching for "hello" I am getting 0 hits! WHY!!

  13. #13
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,330
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by moderns View Post
    WHY!!
    please re-read the excerpt from da manual that i posted above
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •