SitePoint Sponsor

User Tag List

Results 1 to 5 of 5
  1. #1
    Back in Action Winged Spider's Avatar
    Join Date
    Jun 2001
    Location
    outside my mind
    Posts
    900
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Want to parse a HTML page in PHP.

    I'd like to parse the contents of this page:

    http://bungie.net/Stats/PlayerStats.aspx?player=gnawph

    with PHP.

    Any tips, tricks, or suggestions?

    I'm googling right now but can't find anything useful.

    Thanks!


  2. #2
    $this->toCD-R(LP); vinyl-junkie's Avatar
    Join Date
    Dec 2003
    Location
    Federal Way, Washington (USA)
    Posts
    1,524
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Here is something I found a while back that might get you going in the right direction. This is Perl, not PHP, but there are parts of it where I think you can tell how to adapt to a PHP application to parse that web page.

    Hope this helps.
    Music Around The World - Collecting tips, trade
    and want lists, album reviews, & more
    Showcase your music collection on the Web

  3. #3
    ********* Victim lastcraft's Avatar
    Join Date
    Apr 2003
    Location
    London
    Posts
    2,423
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Hi...

    Quote Originally Posted by Winged Spider
    I'd like to parse the contents of this page:

    http://bungie.net/Stats/PlayerStats.aspx?player=gnawph

    with PHP.
    There are several libraries and tools out there. Harry Feucks' XML_HTMLSax library (available in PEAR) is an event parser that generates XML style SAX events.

    Along the same line you can use the HTMLTidy tool (from the W3C here http://www.w3.org/People/Raggett/tidy/). With a little tweak of the configuration you can get it to turnHTML into XML. You could then use PHP's XML parser to read it. There is even a PHP extension (in PECL) to use this tool from within PHP itself.

    You could also take the code from inside SimpleTest. There is a small web browser embedded in the package, although that is more useful for navigating web pages than extracting content. The code might give you some ideas though.

    yours, Marcus
    Marcus Baker
    Testing: SimpleTest, Cgreen, Fakemail
    Other: Phemto dependency injector
    Books: PHP in Action, 97 things

  4. #4
    Non-Member coo_t2's Avatar
    Join Date
    Feb 2003
    Location
    Dog Street
    Posts
    1,819
    Mentioned
    1 Post(s)
    Tagged
    1 Thread(s)
    Googled and found this.

    If you're OK with Perl, there's an HTML parser in CPAN.

    You'd think there'd be one in PEAR.

    --ed

  5. #5
    Non-Member coo_t2's Avatar
    Join Date
    Feb 2003
    Location
    Dog Street
    Posts
    1,819
    Mentioned
    1 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by coo_t2

    You'd think there'd be one in PEAR.
    Quote Originally Posted by lastcraft

    There are several libraries and tools out there. Harry Feucks' XML_HTMLSax library (available in PEAR) is an event parser that generates XML style SAX events.
    Ah, I always looked under the HTML category for a parser.

    --ed


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •