Site Search

So a client wants a site search, but the trick is , he/ she does not want a database for that search. Basically they want a plain html content search.

Technologies in use:

[LIST]
[]php 5.2.X
[
]IIS
[*]NO FRAMEWORK, purely based on php scripts
[/LIST]Server

Can any one supply me with a few ideas? As I have googled and googled, but cant find the correct info.

How about using Google search :wink: (and I don’t mean google to find a solution to your problem, but using Google to search the site)

I have thought about this, but google only searches it’s database, thus in the beginning the site will not have any data in the database and the search will end up returning no data.

Hi…

The client shouldn’t be making technical decisions.

Recommend a search appliance plugged into their rack. Google do them (Google mini) and Thunderstone. Don’t forget to add a 20% fee fo your time. Once they pick their jaw off of the floor, ask them again about the database.

I suspect that what they actually want is a non-SQL database such as a full text engine.

Lucene (Java), Sphinx, Solr are all examples. You will need a crawler to scan in your own domain collecting URLs as you go.

yours, Marcus

Then, using PHP, I guess you’ll have to use the file system functions to open each html page and search for the search term.
I mean, since you don’t use a DB, it’s a static HTML site, right? As far as the content goes, at least?

Definitely an option.

Does anyone know of a MSSQL Site search plug in for php. The client is willing to pay for it, but not willing to wait.

Google Site Search is an inexpensive, hosted version of Google’s search engine, that can be used on a single site. It does not use Google’s main index.

Does any one know how to remove all the google advertisements from a google cse?

Am I as paid user, allowed to remove the markings placed there by google?

So I have decided to go the custom root. I am making use of Google CSE. I am using there xml api.

The one problem I have is that the maximum amount of results are not accurate, thus causing trouble with my pagination.

Does any body know a way around this?

Take a a look at http://www.mnogosearch.org/
It case search your database as well as store search index in database and they have php libraries and php extension.

Google CSE is up and running fine. But it won’t seem to index the site correctly. It still returns results of the previous site on that domain.

Google CSE has an option where you can have the site re-indexed on demand. The problem I am experiencing is that it wont allow me to submit my site map for re-indexing.

It keeps giving me the following error: “You must submit a valid verified Sitemap.”.

I am attempting to submit the same sitemap as I was submitting to google webmaster tools.

I am also presented with "oops! We were not able to find a verified Sitemap associated with your current Google account. Submit a Sitemap using Webmaster Tools. " when I go to the Indexing tab in my CSE account. We have already submitted a site map in webmaster tools.

Any idea what kind of sitemap CSE is expecting?