SitePoint Sponsor

User Tag List

Results 1 to 7 of 7

Hybrid View

  1. #1
    SitePoint Zealot
    Join Date
    Jun 2000
    Posts
    165
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hello,

    I'm trying to grab the yahoo homepage, and store each printed word (i.e. browser outputed words only) as an element in an array.

    Grabbing and parsing the page is no problem using:

    $fp = fsockopen ("yahoo.com/", 80, &$errnr, &$errstr, 5);
    fputs($fp,"GET $whatever HTTP/1.0\r\n\r\n");

    e.t.c............

    This produces a whole host of words that can then be split into an array (using " " as a delimiter).

    But, the problem is, that elements in the array still contain lots of unwanted data (I only want the browser output).

    So far I'm having to use strip_tags and str_replace a hell of a lot, and I'm still not achieving perfection (i.e. unwanted data remains.)

    So do you know of an easy and effective way to achieve the goal of only browser outputted words please?

    Thanks,

    Jason

  2. #2
    Dumb PHP codin' cat
    Join Date
    Aug 2000
    Location
    San Diego, CA
    Posts
    5,460
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    What exactly do you want stripped out, Do you want puncuation or just words?
    Please don't PM me with questions.
    Use the forums, that is what they are here for.

  3. #3
    SitePoint Member
    Join Date
    Aug 2000
    Posts
    21
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    One of the admins' here helped me with something like that..

    what about the file() function...

    $line = file("http://www.yahoo.com");
    echo "line[5];


    and if you want the whole page just use while loop going from 0 to whatever...

    just a thought, i'm in class don't have access to webserver where i can test it.. sorry...
    I remember falling...
    i remember marching..
    like a one man army..


  4. #4
    AdSpeed.com Son Nguyen's Avatar
    Join Date
    Aug 2000
    Location
    Silicon Valley
    Posts
    2,241
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Find some threads in Web Host talk about free php host, sign up for one just to test your scripts!
    - Son Nguyen
    AdSpeed.com - Ad Serving and Ad Management Made Easy

  5. #5
    Dumb PHP codin' cat
    Join Date
    Aug 2000
    Location
    San Diego, CA
    Posts
    5,460
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Here is a function you could call anytime you wanted to get just the browser ouput with no tags or puncuation or special charachters (ie   or &#183
    function cleanit($file) {
    //Just add or remove any other puncuation marks you want to take out, make sure if it is a php sensitive character to escape it like I did with the period
    $baddies = array("\.", ",", ":", "/", "-", "!", "&");
    $file = file($file);
    while (list($key,$val) = each($file)) {
    $tmpstr = strip_tags($val);
    $tmpstr = trim($tmpstr);
    $tmpstr = stripslashes($tmpstr);
    $tmpstr = ereg_replace("&[#a-zA-Z0-9]{4};", " ", $tmpstr);
    for ($i=0;$i<count($baddies);$i++) {
    $tmpstr = ereg_replace($baddies[$i], " ", $tmpstr);
    }
    $newstring .= $tmpstr." ";
    }
    return $newstring;
    }

    Sample Usage:
    //Call the function and pas a url
    $data = cleanit("http://www.yahoo.com");
    print $data;
    Please don't PM me with questions.
    Use the forums, that is what they are here for.

  6. #6
    SitePoint Zealot
    Join Date
    Jun 2000
    Posts
    165
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Freddie,

    That's fantastic, thanks mate.

    Or thank your cat if he coded the function!

    Cheers,

    Jase

  7. #7
    Dumb PHP codin' cat
    Join Date
    Aug 2000
    Location
    San Diego, CA
    Posts
    5,460
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    No he just helped me troubleshoot it a bit.
    Please don't PM me with questions.
    Use the forums, that is what they are here for.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •