SitePoint Sponsor

User Tag List

Results 1 to 4 of 4
  1. #1
    Currently Occupied; Till Sunda Andrew-J2000's Avatar
    Join Date
    Aug 2001
    Location
    London
    Posts
    2,475
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Red face Large Scale Internal Search Engines

    I need to create a large scale search engine, if anyone has any ideas, pointers or tips they would be greatly appreciated? If any samples are available, that would be nice:P. Its essentially for an internal application that needs to extract data from a number of tables in a MySQL database. It would ideally be something like vBulletins search functionality.

    Thanks.

  2. #2
    SitePoint Wizard silver trophy someonewhois's Avatar
    Join Date
    Jan 2002
    Location
    Canada
    Posts
    6,364
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Full text search is what most forums do, isn't it? I believe vB caches searches too, I'm not 100% sure though.. HEAP table's are always fast, and then if the phrase has been searched lately it'll be a lot faster, less server intensive (ie. have a third table of phrases2posts, and it has an id, phraseid, postid.. phraseid is the phrases table, which is a list of phrases, postid is the id of the posts table, obviously respectively). I'm not sure what kind of cache time frame you'd go by, but a cron job would probably be your best bet for truncating that table.

    Post (or blog) your conclusions, eh? It'd be interesting to know how you end up doing it.

  3. #3
    ********* Victim lastcraft's Avatar
    Join Date
    Apr 2003
    Location
    London
    Posts
    2,423
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Hi...

    Quote Originally Posted by Andrew-J2000
    I need to create a large scale search engine, if anyone has any ideas, pointers or tips they would be greatly appreciated?
    With MySQL you have to match words exactly and cannot search for misspellings. Take a look at Xapian, which unlike most full text engines, has incremental indexing. That is you don't have to rebuild the indexes when you add more data.

    yours, Marcus
    Marcus Baker
    Testing: SimpleTest, Cgreen, Fakemail
    Other: Phemto dependency injector
    Books: PHP in Action, 97 things

  4. #4
    Currently Occupied; Till Sunda Andrew-J2000's Avatar
    Join Date
    Aug 2001
    Location
    London
    Posts
    2,475
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I hastily posted this thread, when I first found out the initial task and now that I have some further details, I feel i'm going to get myself in a bit of a mess with the way the servers are currently configured. I am not thrilled at the idea of suggesting the use of Xapian, as this would be out of my hands and left with the hosting administrators to configure.

    Currently we have a very unusual configuration for the servers, which means alot of messing about with cached content that makes me a little dubious as to Xapian's implementation. I would prefer, something in PHP that does not require contacting the administrators, as well as being something that I can manage myself.

    I will get back to this thread later this week when I do have to delve a little deeper as the search engine is not a priority right now. If you can suggest something that can aid me that would be a great help, thanks for the suggestions, so far Nathan & Marcus


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •