Go Back   SitePoint Forums > Forum Index > Program Your Site > XML and Web Services
Newsletter FAQ Members List Calendar Mark Forums Read

New to SitePoint Forums? Register here for free!

SitePoint Sponsor
 
Reply
 
Thread Tools Display Modes
Old Aug 7, 2007, 08:59   #1
kgun
SitePoint Addict
 
Join Date: Nov 2005
Location: Moss, Norway.
Posts: 274
Google site operator and forum site search

Post in this forum since I may implement the site search function that is described in the SP book:

Thomas Myer: No Nonsense XML Web Development With PHP


On every forum without exception where I participate I have seen the following when I am looking for a post:
  • Often I do not find what I am looking for when I use the forum site search.
  • I often find it instantly when using the Google site operator.

Here is my KW search on the SP site search forum:

SP forum site search:

kgun yank

Google:
kgun yank site:www.sitepoint.com

Here is the thread I am looking for, second hit on Google August 7 2007:
? about p74 of Sitepoint db book

Here is the only thread I find while doing a SP site search:

No Nonsense - XML Development with PHP

Any comments?
kgun is offline   Reply With Quote
Old Aug 8, 2007, 14:15   #2
guinanie
Non-Member
 
Join Date: Jul 2007
Posts: 81
I think you should put your complete KW when searching in forum search...

And the thread where you have posted for faster searching... Hope it helps...
guinanie is offline   Reply With Quote
Old Aug 9, 2007, 05:59   #3
kgun
SitePoint Addict
 
Join Date: Nov 2005
Location: Moss, Norway.
Posts: 274
  1. I found it easily usning the Google site operator.
  2. If that is a requirement, why then implement site search when Google site operator is better?
  3. In the end, the quality of site serarch is how you organize your XML files or database records and then finally how the search function is implemented.
  4. Anybody that has seen a site search function that give better results than using the Google site operator?
  5. My private conclusion, on well known sites, it is difficult to beat Google site search.
kgun is offline   Reply With Quote
Old Aug 23, 2007, 01:12   #4
Akash Mehta
SitePoint Zealot
 
Join Date: Jan 2007
Location: Australia
Posts: 137
Searching large sites is often highly resource intensive and best executed by the professionals. And when it comes to search, you can't argue that Google's pretty damn good. However, the moment you use Google's site search you can lose revenues, lose traffic and risk losing visitors. A functioning site search that just works is often better than sending visitors to a customised google search box.
Akash Mehta is offline   Reply With Quote
Old Aug 23, 2007, 04:34   #5
kgun
SitePoint Addict
 
Join Date: Nov 2005
Location: Moss, Norway.
Posts: 274
Agree to that.

Point 5 above was not good. It should be:

5. My private conclusion, on well known sites, it is difficult to beat the Google site operator.

That is, generally, it is easier to find what you are looking for using that operator than using a site search function. See my first post.

Some related links:

Swish-e: Simple Web Indexing System for Humans - Enhanced
"Swish-e is ideally suited for collections of a million documents or smaller. Using the GNOME™ libxml2 parser and a collection of filters, Swish-e can index plain text, e-mail, PDF, HTML, XML, Microsoft® Word/PowerPoint/Excel and just about any file that can be converted to XML or HTML text".

Key features
  • Quickly index a large number of documents in different formats including text, HTML, and XML.
  • Phrase searching and wildcard searching
  • Use powerful Regular Expressions to select documents for indexing or exclusion.
  • Limit searches to parts of documents such as certain HTML tags (META, TITLE, comments, etc.) or to XML elements.
  • Can report structural errors in your XML and HTML documents.
  • Etc. etc.

Fluid Dynamics Search Engine

Features

Search Tools for Web Sites and Intranets

For large companies:
Oracle Secure Enterprise Search

Private conclusion:
May be the most efficient Site Search engine is made by
  1. Stucturing your documents very well. Be careful when you define XML tags and attributes.
  2. Using a stream based XML parser, like SAX or XMLReader, since it minimizes memory usage.

Last edited by kgun; Aug 23, 2007 at 11:01.
kgun is offline   Reply With Quote
Old Aug 27, 2007, 20:24   #6
wwb_99
Community Advisor
SitePoint Award Recipient
 
wwb_99's Avatar
 
Join Date: May 2003
Location: Washington, DC
Posts: 9,134
If you want google-style search for your site, the answer is really easy: just buy a google mini. I just picked one up the other month, and it is very, very slick. Having implemented or had implemented a number of search features, it also lines up very, very well when one starts considering the whole ROI thing.
wwb_99 is offline   Reply With Quote
Old Sep 2, 2007, 10:25   #7
kgun
SitePoint Addict
 
Join Date: Nov 2005
Location: Moss, Norway.
Posts: 274
Two very fast questions:
  1. Is it driven by XML like the one described in my first post?
  2. Does it have AJAX functionality?
Related WPW thread:
free internal site search code ?

Note the last posts in that thread.
kgun is offline   Reply With Quote
Old Sep 2, 2007, 11:01   #8
wwb_99
Community Advisor
SitePoint Award Recipient
 
wwb_99's Avatar
 
Join Date: May 2003
Location: Washington, DC
Posts: 9,134
The google mini returns results in [obtuse] XML. The ajax question is kind of immaterial--it provides a search service, and you can do whatever you can code up with the results. Which means you could do AJAX or PDF via XSL-FO or just about anything else you can do with data + modern web technologies.

Not really making heads nor tails of that forum thread. Then again, it is also immaterial. The key question here is "how can I get a search engine or appliance that can read my data and provide useful results." The only important technical issue is:

a) Can this thing read my data?
b) Can I deal with the results successfully?

XML vs PHP vs Whatever is not really a consideration.
wwb_99 is offline   Reply With Quote
Old Sep 2, 2007, 15:37   #9
kgun
SitePoint Addict
 
Join Date: Nov 2005
Location: Moss, Norway.
Posts: 274
Quote:
Originally Posted by wwb_99 View Post
The only important technical issue is:

a) Can this thing read my data?
b) Can I deal with the results successfully?

XML vs PHP vs Whatever is not really a consideration.
Agree to that. PHP adds functionality.

Some fast thoughts (that I hope are correct)

Example:
Load the xml document / file into element objects using the PHP simplexml_load_file function and select all elements of the document:

xpointer(string-range(//*,'search for any element in the document')).

More precisely:

$definitions=simplxml_load_file(definitions.xml);

$definitions->xpointer(string-range(//*,'search for any element in the document'));

should be possible, even if I have not tried it.

If you need to cast the object to a string, it is done in the usual way:
(string)$definitions->xpointer(string-range(//*,'search for any element in the document'));

Then this XPointer expression can be used in different ways, for example to create a link from all occurrences of this text to the relevant definition of this text in the file containing the list of definitions. Another application is indicated in Meyers (april 2006 book) page 128 where the heart of the site search function is explained.

For more details see
Improving Web Linking using XLink page 12.

Then looping through all the documents in the site, has generalized the site search.

The great advantage with an XML driven site is that it is simple to make content compliant with older browsers, since the files can be transformed to HTML using XSLT. One source, many presentations, formats that you indicate yourself using XSL(-FO).

P.s. The search may be done faster using a stream based pull parser like SAX XML or XMLReader. It should also be possible to seach for terms / words in CDATA blocks.

Last edited by kgun; Sep 2, 2007 at 16:38.
kgun is offline   Reply With Quote
Reply

Bookmarks

« Previous Thread | Next Thread »

Thread Tools
Display Modes

 
Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Sponsored Links
 
Forum Jump


All times are GMT -7. The time now is 14:41.


Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
Copyright 1998-2009, SitePoint Pty Ltd. All Rights Reserved