I am building website whose content is mainly articles, comments and forum posts. And I’m expecing new content to be added quite frequently. Now I need a good search engine to search through all the contents of the site. The site is made with php and mysql and I would be able to make my own search engine but I know making a good one will take quite some time and I need to have something soon so I’m looking for existing solutions - can be commercial if not too pricey.
I have considered Google Site Search - it’s almost perfect for my needs except it would take quite some time for new content to be indexed. New comments and posts will be added all the time and I don’t want a delay more than 1 or 2 hours. They provide on-demand indexing but from their descriptions it’s limited and cannot be used automatically.
Then I found Zoom which would be perfect except I would have to index pages though a windows application.
My needs are:
searching all html content on the site
good sorting by relevance and by date algorithms
frequent indexing - ideally my php application could notify the engine of each modified page to update the index since some pages will be changing very frequently while others (with old content) will be changing very rarely
ability to run on a typical good shared server with php and mysql
highlighting of search items in the results
ability to integrate the search results into the site layout and style them whatever I like
You could use something like SOLR, have a curl request to query it, and get indexed results. It does sorting by relevancy and other methods, supports highlighting, return your results as JSON or XML, and you can basically write a script that would index your data, no need to wait.
My only drawback for this is that I assume you want something quick. It does take some time to learn, and you may need to write some things on your own to get what you want.
Thanks, sounds good however, from their search example I’m not convinced to the algorithm - my guess is it uses mysql full-text search, which I’ve had bad experiences with. Also, the search is very slow on that server, often resulting in 30 seconds php timeout. This could bring my server to the crawl under heavier use.
The problem is it written in java and I don’t have access to it on my server.
I don’t need something extremely quick - spending a day or two setting it up and tuning to my needs is still fast compared with weeks needed for me to write it from scratch. The problem is I want a pretty good search engine with a decent algorithm and speed.
Oh, and I forgot about one more thing I would like to have - an ability to provide a list or dictionary of related words (different forms, synonyms, etc.) so that the search would recognize them. I know google is pretty good at this.
Hmm, that’s surprising. I just checked the sites I used them on and they’ll still uber fast. The GreyWyvern site seems a bit slow all around so may not be the search script itself.
The Zend Lucene search engine is pretty powerful and some CMSs integrate it into their search functionality (concrete5 comes to mind). It’ll be a bit more difficult to set up, but it could be closer to what you’re looking for.