SitePoint Sponsor

User Tag List

Results 1 to 4 of 4
  1. #1
    SitePoint Zealot
    Join Date
    Dec 2005
    Posts
    153
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    fulltext search vs custom search

    Hello,
    I want to implement a seach in my site. But heard alot of horror stories about the mysql fulltext seacrch performance. I have a very simple site based on tags. Do you guys think its just better to use fulltext searching of mysql or just create my own simple search algorithm?

  2. #2
    SitePoint Wizard silver trophybronze trophy Cups's Avatar
    Join Date
    Oct 2006
    Location
    France, deep rural.
    Posts
    6,869
    Mentioned
    17 Post(s)
    Tagged
    1 Thread(s)
    If your tags are kept in a table and are indexed, then search through them first, then possibly fall back to a fulltext search on the full thing.

    You may be already doing this but you could record what people are searching for, perhaps different things depending on the time of day/week/year.

    Cherry pick those search terms and come up with "ready made" results.
    PHP Code:
    $search_term['dog food']['dog_food_home_page.php']; 
    So youd snag that term and link straight to the dog_food_home_page, or, more likely show a nice div with enticing links to that page.

    This depends wholly upon the type of site you are running and the predictability of results. You won't know for sure 'till you log your searches though.

    I was really surprised how predictable some searches are at certain times of the year.

    I mocked up a search like that which in effect filtered results so:

    1) single word
    Search through known key words and present a "did you mean?" wikipedia disambiguation ( search wikipedia for pdf and it'll ask you to pick what kind of pdf you are on about, Panama Defence Forces? )

    Then link to articles featuring the chosen keyword.

    2) Phrases
    Search oft used "ready mades" in an array - as above.

    3) phrases
    Search through key words in an indexed table

    4) then we use google, but you could show results from a fulltext search ...
    or split the phrase up into single words and ask do you want to search for single words dog or food which then do a straight mysql search "where ... like '% $x %'" etc.

    Which of these filters kick in of course depends as I say on the complexity and breadth of your content, and what you are happy to carry on hitting your system to do them.

    You might be content to stop once you hit the array of "ready mades"

  3. #3
    SitePoint Wizard silver trophy kyberfabrikken's Avatar
    Join Date
    Jun 2004
    Location
    Copenhagen, Denmark
    Posts
    6,157
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Zend Framework has an implementation of Lucene in pure PHP. You can use it to build an index of your content, and the efficiently search this index.

  4. #4
    SitePoint Addict
    Join Date
    Aug 2005
    Posts
    207
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I would definitely use a stored procedure with your logical algorithm embedded, then any search with strict score based relevance becomes a simple task. Full Text horror stories has more to do with the data being searched, not the database doing the search.

    As for Zend Search Lucene, it's not bad but it's still slow, I know I was told that they would be changing some of core functions used in the result set handling, array_intersect_key() array_diff_...(), because those functions are pretty bad, slow! Even without changing those, the indexing is fine for small sites. I know I tried it few times, and liked all the options, but it was slow. I was working on a really big site index, 8 million plus pages of all the world countries, logistical information for a government entity. I gave up after the second try, and went back to the database, spent a lot of time improving the search algorithm and using word stemming, just the stemming alone saved a lot of database resources, and greatly improved the relevance.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •