So after my initial concern over the impact of this, figured it out at last – what Google is trying to tell us – we’ve got a huge cluster right there at our disposal!
So spent the night hacking together PHP MapReduce – the master node, which you run on your server, uses this search to locate victims… errr … workers to participate in the cluster. You then write some code like;
<?php require_once 'mapreduce.php'; $veryLargeFile = '/tmp/bigfile'; $map = 'http://masternode.com/mapreduce/wordcounter.phps'; $reduce = 'http://masternode.com/mapreduce/adder.phps'; # Massively distributed computing, here we come... $result = MapReduce($veryLargeFile, $map, $reduce);
At the moment it’s limited PHP-only execution on the workers, so that’s a fairly limited size cluster. But working on extending it so that your map and reduce functions are automatically translated into MySQL stored procedures, allowing this search to significantly expand the cluster (thanks Ilia). And with help from adodb I think it should be possible to make this DB independent.
But where this get’s really interesting is considering this search. Now this is a lot harder to implement but it should be possible to invite browsers to join the cluster as well, dramatically increasing your processing power. The workflow would be something like master => worker server => worker browser => (via AJAX back to) => work server => master.
We’ve entered the real age of distributed computing folks. Think of the wonderful things we could do with this, such as the biggest blog spam filter ever!
This is a JOKE btw!
…and probably a bad one. It’s not April but anyway. And I’m not working on this. And I never will be.
Think it might be a good idea for Google to allow people to restrict the search to a single domain, so people can at least see what’s in their on their own site and clean up as needed.