I'm trying to grab the yahoo homepage, and store each printed word (i.e. browser outputed words only) as an element in an array.

Grabbing and parsing the page is no problem using:

$fp = fsockopen ("yahoo.com/", 80, &$errnr, &$errstr, 5);
fputs($fp,"GET $whatever HTTP/1.0\r\n\r\n");


This produces a whole host of words that can then be split into an array (using " " as a delimiter).

But, the problem is, that elements in the array still contain lots of unwanted data (I only want the browser output).

So far I'm having to use strip_tags and str_replace a hell of a lot, and I'm still not achieving perfection (i.e. unwanted data remains.)

So do you know of an easy and effective way to achieve the goal of only browser outputted words please?