Shaking up Search

Indexing looks better with humans

If you do any reading about search engines the word “algorithm” will jump out at you pretty quick, perhaps with a pinch of MathML, just to make sure the lay-people are truly dazzled. All kinds of lore and mystery surround Google’s search algorithm. Meanwhile IBM have this UIMA-thing for truly smart searching.

But compare these search results for “PHP SOAP” – http://del.icio.us/search/?all=php+soap and http://www.google.com/search?q=php+soap. Right now Google returns the first result as the long-dead PHP SOAP Toolkit. Meanwhile delicious gives me this tutorial first: PHP Web Services Without SOAP – how does that result compute in terms of relevance?

Put another way, how smart does your search algorithm need to be for it to be able to return a result like “FatBoy Slim?!? You’re kidding, right? Break beats are dead baby!”

As I (cynically) pointed out here, humans are great for building search engine indexes. They’re self-maintaining, abundant, smart and distributed. It doesn’t matter how clever your algorithm is – even if you can match a human’s ability to categorize, the economics of doing so will kill you.

Meanwhile, an interesting point on Reasons Unbeknownst here;

bloggers are starting to get more traffic from Del.icio.us anyway

That’s not to say delicious is about about to replace Google – as I see it, delicious is a tool by Nerds for Nerds – I can’t see the concept, in it’s current form, reaching out to non-Nerds until it’s possible for “indexing” to be effortless for someone who’s (actively) spending only an hour a week online. But that’s not the point – to me what’s interesting here is an index built by humans is proving to be at least as good as one build by machine – delicious made the breakthrough here in showing it can be done.

Distributing Search

On a parallel tack, if go hunting for “distributed search”, you see a similar phenomenon to search – nerds in pursuit of algorithms to allow giant search indexes to be distributed. They’re stuck on the problem of how to cope lack of resources on the peer.

But think a similar “human hack” applies here – picking a number, I’d guess that 90%+ of what any given person searches for is on the same basic topic – the things they’re interested in. Most of the time an individual only needs an index of things they’re interested in. That, in itself, doesn’t magically solve all problems but throw in “self organising communities” of the P2P kind and it doesn’t look so remote.

Anyway. Just retelling what Web 2.0 is already saying.

Win an Annual Membership to Learnable,

SitePoint's Learning Platform

  • http://www.dotcomwebdev.com chris ward

    I’ve always said that this is why Yahoo bought up Flickr.

    Flickr uses it’s folksonomy model to provide results relevant to tags(keywords)

    The search-engine to win the race, of applying this model to the rest of the web, will be king!

  • http://www.lopsica.com BerislavLopac

    Well, Google still has the PigeonRank.

  • http://www.sentrylogin.com/ ericshawnSentry

    Right, and PigeonRank is based on the work of reviled and vituperative behaviorist B.F. Skinner. Ayn Rand is spinning in her grave.

  • http://www.deanclatworthy.com Dean C

    Great points Harry. But AI is a constantly developing field, it’ll only be so long before the search engine algorithims get better and more “human-like” :)

  • http://www.bn23.com b0rdslide

    Google is already heading in the direction of having a list of topics that are relevant to individual users. If you have a google account then all of your searches are added to your “search history” and used in your “personalized search” service (unless you turn it off) and over time your searches become more relevant to the types of information that you are looking for.

    I think that they are still building it into their systems at the moment but you can bet that it will be given a lot of marketing once it’s fully integrated and given it’s full influence on the search results.

  • http://www.phppatterns.com HarryF

    Google is already heading in the direction of having a list of topics that are relevant to individual users. If you have a google account then all of your searches are added to your “search history” and used in your “personalized search” service

    And with Google Desktop they’ve got a foot in the door of distributed search.