SitePoint Sponsor

User Tag List

Results 1 to 4 of 4
  1. #1
    SitePoint Evangelist
    Join Date
    Jan 2005
    Location
    UK
    Posts
    539
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    SimpleXML - some rss feeds use <entry> some use <item>

    Currently i grab feeds like:

    PHP Code:
    $x simplexml_load_string($news_xml);
        foreach(
    $x->channel->item as $item){
            
    $item_date = (string) $item->pubDate;
            
    $item_desc = (string) $item->description;
    ...

    This has worked fine until i hit a feedburner feed that is different. It uses <entry> and <content type="html" xml:lang="en-US" xml:base="http://rssdomain.com">. It also uses <feedburnerrigLink> for the proper link.

    How can I make my php accept this new format as well as the normal method most rss feeds seem to use?

  2. #2
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    9,032
    Mentioned
    152 Post(s)
    Tagged
    2 Thread(s)
    I don't know about the question at hand but I know I've written an RSS scraper in the past and it's a royal PITA because all of them are different and all of them lie about all sorts of stuff like character encoding, etc.
    For a newer project I've used Zend_Feed_Reader from the Zend Framework [which can be used as a stand alone component] and in my experience that works like a charm. You might look into using that and saving yourself a lot of headaches down the road.
    Rémon - Hosting Advisor

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  3. #3
    SitePoint Evangelist
    Join Date
    Jan 2005
    Location
    UK
    Posts
    539
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks, gone with simplepie in the end!

  4. #4
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    9,032
    Mentioned
    152 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by skyline View Post
    Thanks, gone with simplepie in the end!
    Yes, simplepie is pretty nice as well.

    Even now when I think about it, dowloading a feed with an HTTP header claiming it's UTF-8, an XML header claiming it's ISO-8859-1 and then it turns out to be CP-1252, I still get the shivers a bit.

    At one point I'd even written a function that kept ut8_decode'ing until it introduced extra question marks at which point it took the text from the last step as the final text. Brrr.

    Parsing RSS sucks.
    Rémon - Hosting Advisor

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •