Shaking up Search

    Harry Fuecks
    Share

    Indexing looks better with humans

    If you do any reading about search engines the word “algorithm” will jump out at you pretty quick, perhaps with a pinch of MathML, just to make sure the lay-people are truly dazzled. All kinds of lore and mystery surround Google’s search algorithm. Meanwhile IBM have this UIMA-thing for truly smart searching.

    But compare these search results for “PHP SOAP” – http://del.icio.us/search/?all=php+soap and http://www.google.com/search?q=php+soap. Right now Google returns the first result as the long-dead PHP SOAP Toolkit. Meanwhile delicious gives me this tutorial first: PHP Web Services Without SOAP – how does that result compute in terms of relevance?

    Put another way, how smart does your search algorithm need to be for it to be able to return a result like “FatBoy Slim?!? You’re kidding, right? Break beats are dead baby!”

    As I (cynically) pointed out here, humans are great for building search engine indexes. They’re self-maintaining, abundant, smart and distributed. It doesn’t matter how clever your algorithm is – even if you can match a human’s ability to categorize, the economics of doing so will kill you.

    Meanwhile, an interesting point on Reasons Unbeknownst here;

    bloggers are starting to get more traffic from Del.icio.us anyway

    That’s not to say delicious is about about to replace Google – as I see it, delicious is a tool by Nerds for Nerds – I can’t see the concept, in it’s current form, reaching out to non-Nerds until it’s possible for “indexing” to be effortless for someone who’s (actively) spending only an hour a week online. But that’s not the point – to me what’s interesting here is an index built by humans is proving to be at least as good as one build by machine – delicious made the breakthrough here in showing it can be done.

    Distributing Search

    On a parallel tack, if go hunting for “distributed search”, you see a similar phenomenon to search – nerds in pursuit of algorithms to allow giant search indexes to be distributed. They’re stuck on the problem of how to cope lack of resources on the peer.

    But think a similar “human hack” applies here – picking a number, I’d guess that 90%+ of what any given person searches for is on the same basic topic – the things they’re interested in. Most of the time an individual only needs an index of things they’re interested in. That, in itself, doesn’t magically solve all problems but throw in “self organising communities” of the P2P kind and it doesn’t look so remote.

    Anyway. Just retelling what Web 2.0 is already saying.