SitePoint Sponsor

User Tag List

Results 1 to 9 of 9
  1. #1
    SitePoint Addict
    Join Date
    Nov 2005
    Location
    Moss, Norway.
    Posts
    283
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Google site operator and forum site search

    Post in this forum since I may implement the site search function that is described in the SP book:

    Thomas Myer: No Nonsense XML Web Development With PHP


    On every forum without exception where I participate I have seen the following when I am looking for a post:
    • Often I do not find what I am looking for when I use the forum site search.
    • I often find it instantly when using the Google site operator.


    Here is my KW search on the SP site search forum:

    SP forum site search:

    kgun yank

    Google:
    kgun yank site:www.sitepoint.com

    Here is the thread I am looking for, second hit on Google August 7 2007:
    ? about p74 of Sitepoint db book

    Here is the only thread I find while doing a SP site search:

    No Nonsense - XML Development with PHP

    Any comments?

  2. #2
    Non-Member
    Join Date
    Jul 2007
    Posts
    77
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I think you should put your complete KW when searching in forum search...

    And the thread where you have posted for faster searching... Hope it helps...

  3. #3
    SitePoint Addict
    Join Date
    Nov 2005
    Location
    Moss, Norway.
    Posts
    283
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    1. I found it easily usning the Google site operator.
    2. If that is a requirement, why then implement site search when Google site operator is better?
    3. In the end, the quality of site serarch is how you organize your XML files or database records and then finally how the search function is implemented.
    4. Anybody that has seen a site search function that give better results than using the Google site operator?
    5. My private conclusion, on well known sites, it is difficult to beat Google site search.

  4. #4
    SitePoint Zealot
    Join Date
    Jan 2007
    Location
    Australia
    Posts
    137
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Searching large sites is often highly resource intensive and best executed by the professionals. And when it comes to search, you can't argue that Google's pretty damn good. However, the moment you use Google's site search you can lose revenues, lose traffic and risk losing visitors. A functioning site search that just works is often better than sending visitors to a customised google search box.

  5. #5
    SitePoint Addict
    Join Date
    Nov 2005
    Location
    Moss, Norway.
    Posts
    283
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Agree to that.

    Point 5 above was not good. It should be:

    5. My private conclusion, on well known sites, it is difficult to beat the Google site operator.

    That is, generally, it is easier to find what you are looking for using that operator than using a site search function. See my first post.

    Some related links:

    Swish-e: Simple Web Indexing System for Humans - Enhanced
    "Swish-e is ideally suited for collections of a million documents or smaller. Using the GNOME™ libxml2 parser and a collection of filters, Swish-e can index plain text, e-mail, PDF, HTML, XML, Microsoft® Word/PowerPoint/Excel and just about any file that can be converted to XML or HTML text".

    Key features

    • Quickly index a large number of documents in different formats including text, HTML, and XML.
    • Phrase searching and wildcard searching
    • Use powerful Regular Expressions to select documents for indexing or exclusion.
    • Limit searches to parts of documents such as certain HTML tags (META, TITLE, comments, etc.) or to XML elements.
    • Can report structural errors in your XML and HTML documents.
    • Etc. etc.


    Fluid Dynamics Search Engine

    Features

    Search Tools for Web Sites and Intranets

    For large companies:
    Oracle Secure Enterprise Search

    Private conclusion:
    May be the most efficient Site Search engine is made by
    1. Stucturing your documents very well. Be careful when you define XML tags and attributes.
    2. Using a stream based XML parser, like SAX or XMLReader, since it minimizes memory usage.
    Last edited by kgun; Aug 23, 2007 at 09:01.

  6. #6
    SitePoint Author silver trophybronze trophy
    wwb_99's Avatar
    Join Date
    May 2003
    Location
    Washington, DC
    Posts
    10,653
    Mentioned
    4 Post(s)
    Tagged
    0 Thread(s)
    If you want google-style search for your site, the answer is really easy: just buy a google mini. I just picked one up the other month, and it is very, very slick. Having implemented or had implemented a number of search features, it also lines up very, very well when one starts considering the whole ROI thing.

  7. #7
    SitePoint Addict
    Join Date
    Nov 2005
    Location
    Moss, Norway.
    Posts
    283
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Two very fast questions:
    1. Is it driven by XML like the one described in my first post?
    2. Does it have AJAX functionality?

    Related WPW thread:
    free internal site search code ?

    Note the last posts in that thread.

  8. #8
    SitePoint Author silver trophybronze trophy
    wwb_99's Avatar
    Join Date
    May 2003
    Location
    Washington, DC
    Posts
    10,653
    Mentioned
    4 Post(s)
    Tagged
    0 Thread(s)
    The google mini returns results in [obtuse] XML. The ajax question is kind of immaterial--it provides a search service, and you can do whatever you can code up with the results. Which means you could do AJAX or PDF via XSL-FO or just about anything else you can do with data + modern web technologies.

    Not really making heads nor tails of that forum thread. Then again, it is also immaterial. The key question here is "how can I get a search engine or appliance that can read my data and provide useful results." The only important technical issue is:

    a) Can this thing read my data?
    b) Can I deal with the results successfully?

    XML vs PHP vs Whatever is not really a consideration.

  9. #9
    SitePoint Addict
    Join Date
    Nov 2005
    Location
    Moss, Norway.
    Posts
    283
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by wwb_99 View Post
    The only important technical issue is:

    a) Can this thing read my data?
    b) Can I deal with the results successfully?

    XML vs PHP vs Whatever is not really a consideration.
    Agree to that. PHP adds functionality.

    Some fast thoughts (that I hope are correct)

    Example:
    Load the xml document / file into element objects using the PHP simplexml_load_file function and select all elements of the document:

    xpointer(string-range(//*,'search for any element in the document')).

    More precisely:

    $definitions=simplxml_load_file(definitions.xml);

    $definitions->xpointer(string-range(//*,'search for any element in the document'));

    should be possible, even if I have not tried it.

    If you need to cast the object to a string, it is done in the usual way:
    (string)$definitions->xpointer(string-range(//*,'search for any element in the document'));

    Then this XPointer expression can be used in different ways, for example to create a link from all occurrences of this text to the relevant definition of this text in the file containing the list of definitions. Another application is indicated in Meyers (april 2006 book) page 128 where the heart of the site search function is explained.

    For more details see
    Improving Web Linking using XLink page 12.

    Then looping through all the documents in the site, has generalized the site search.

    The great advantage with an XML driven site is that it is simple to make content compliant with older browsers, since the files can be transformed to HTML using XSLT. One source, many presentations, formats that you indicate yourself using XSL(-FO).

    P.s. The search may be done faster using a stream based pull parser like SAX XML or XMLReader. It should also be possible to seach for terms / words in CDATA blocks.
    Last edited by kgun; Sep 2, 2007 at 14:38.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •