SitePoint Sponsor

User Tag List

Results 1 to 7 of 7
  1. #1
    ********* wombat firepages's Avatar
    Join Date
    Jul 2000
    Location
    Perth Australia
    Posts
    1,717
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi all,

    ok I was wondering if anyone had any good pointers on searching a mysql database, I can of course do simple LIKE '%whatever%' etc, but was wondering if there were any tutorials on this type of thing?

    Basically I want to be able to do a search on 3 specific columns in a DB, 2 * VARCHAR and 1 * TEXT.

    I have looked in the MySQL manual about adding FULLTEXT indexes and then using MATCH to do a FULLTEXT search , but the documentation is a bit sparse, I was wondering if anyone had seen any more info out there, ....or better still has a super-duper script I can look at.

    I want to be able to search by keyword - or a natural text search.

    I have something working but its messy and is basically lots of combinations of LIKE etc.
    ie - if someone enters 3 keywords I am basically running 3 queries and then merging the results afterwards to get % matches etc - this seems silly I am sure there is a better way?

    Any ideas / pointers appreciated.

  2. #2
    Serial Publisher silver trophy aspen's Avatar
    Join Date
    Aug 1999
    Location
    East Lansing, MI USA
    Posts
    12,937
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    not 3 queries, 1.

    where searchfield like '% $term1 %' OR searchfield like '% term2 %' etc etc
    Chris Beasley - I publish content and ecommerce sites.
    Featured Article: Free Comprehensive SEO Guide
    My Guide to Building a Successful Website
    My Blog|My Webmaster Forums

  3. #3
    Your Lord and Master, Foamy gold trophy Hierophant's Avatar
    Join Date
    Aug 1999
    Location
    Lancaster, Ca. USA
    Posts
    12,305
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    The reason there is not much documentation on FullText search is it is a very new and untested feature of MySQL. It is only available in the newest versions (3.23.35 and higher basically, anything before that is worthless.).

    vBulletin has a full text search and while I can't give out the code, I can point you in the right direction...

    When you post a message in a vBulletin forum, the message is exploded into an array, word by word. Each word is then looked for in a word table that holds all unique words. If it is not found it is added, it if it found the ID is stored.

    Now there is a second table which contains word ids and post ids to cross reference them all. As you can guess this becomes a massive table. Here on SitePoint it has over 5 million rows. That is a lot of data to search for, but if you do a search for several words you will find it returns fairly quickly so how do they do it?

    First they take your words and do a search on the word table to see if they are found. If they are they take the word id and put it in an array. That array is then used to make 1 query that can return every post containing that word and a very fast query.

    Using the wordid and the IN clause that MySQL supports they select the post ids from the searchindex table.

    So it would be like:
    SELECT postid FROM searchindex WHERE wordid IN (id1, id2, id3,id4).

    This in return is used to create a new array that contains the post ids. That query looks similar to:
    SELECT * from post WHERE postid IN (post1,post2,post3...)

    This returns all the results.

    If you could see the queries and timed the search engine it would be something like the following if you searched for HTML (2600 hits by the way):
    Page generated in 4.4755820035934 seconds with 13 queries,
    spending 2.8605751991272 doing MySQL queries and 1.6150068044662 doing PHP things.
    Which is a little on the high side for some reason. I will have to look into that.

    Anyway, how can it be so fast? Simple we are making the database do our work for us. By index each and every word in a post we are making an key-value index for every message. This in turn is indexed by MySQL so we are indexing our index. This allows highly focused searches in little or no time.

    Sound complicated? Yeah. It has to be to have any symblance of accuracy. Real Text searching in a database is one of the most difficult things to do.
    Wayne Luke
    ------------


  4. #4
    ********* wombat firepages's Avatar
    Join Date
    Jul 2000
    Location
    Perth Australia
    Posts
    1,717
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    yep 3 queries - assuming you are searching 3 fields - with 3 (or more) seperate search terms
    then each one of those includes the AND/OR's etc

    I know I could concatenate the 3 fields and then do a search on them etc

    But which is more efficient, should I just do a load of simple queries and interpret/extract the data from the returned arrays? or should I do a couple of hum-dinger super-queries??

    I was wondering if there is a 'proper' way to do this? - or at least some documentaion/tutorials?

    Also can anyone confirm my suspicion that I can not create a new FULLTEXT index on an column that already has data in it?
    I can create a new filed with a FULLTEXT index and then populate it successfully, but trying to add one to an existing TEXT field gives me an error about not being able to create the temp file /mysql/data/db_name/&342jhg...(whatever).

  5. #5
    Serial Publisher silver trophy aspen's Avatar
    Join Date
    Aug 1999
    Location
    East Lansing, MI USA
    Posts
    12,937
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    no not 3 queries as long as you are accessing only 1 DB you only need one query (you can even use 1 query to search 2 tables)

    Where field1 like term1 or field1 like term2 or field2 like term1 or field2 like term2

    etc etc

    Sure your where clause will be very long but it'll work.
    Chris Beasley - I publish content and ecommerce sites.
    Featured Article: Free Comprehensive SEO Guide
    My Guide to Building a Successful Website
    My Blog|My Webmaster Forums

  6. #6
    ********* wombat firepages's Avatar
    Join Date
    Jul 2000
    Location
    Perth Australia
    Posts
    1,717
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks Wayne thats excellent, I have seen that VB holds a seperate keywords list but have never looked into it before.

    I think I shall take a walk in that direction...

    One quick question, does VB strip out the dross? ie 'the' , 'to' ,'and' etc ? or not - there was a flash of a thought in my mind as to why I wanted to ask that question... however its gone now - but assuming there was a good reason for asking... does it?


    & Thanks for the info.

  7. #7
    Victory shall be mine tubedogg's Avatar
    Join Date
    Mar 2001
    Location
    Medina, OH
    Posts
    440
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    vB has a whole list of words it excludes, like those you gave, even those such as "around", "aren't", "hopefully", "furthermore", "quite", "rather", "sometime", "well", "welcome", and "zero", just to name a few of the 569 in the current file.
    Kevin


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •