SimpleXML - some rss feeds use <entry> some use <item>

skyline · September 29, 2012, 2:18pm

Currently i grab feeds like:

$x = simplexml_load_string($news_xml);
	foreach($x->channel->item as $item){
        $item_date = (string) $item->pubDate;
        $item_desc = (string) $item->description;
...
}

This has worked fine until i hit a feedburner feed that is different. It uses <entry> and <content type=“html” xml:lang=“en-US” xml:base=“http://rssdomain.com”>. It also uses <feedburner:origLink> for the proper link.

How can I make my php accept this new format as well as the normal method most rss feeds seem to use?

rpkamp · September 29, 2012, 3:47pm

I don’t know about the question at hand but I know I’ve written an RSS scraper in the past and it’s a royal PITA because all of them are different and all of them lie about all sorts of stuff like character encoding, etc.
For a newer project I’ve used Zend_Feed_Reader from the Zend Framework [which can be used as a stand alone component] and in my experience that works like a charm. You might look into using that and saving yourself a lot of headaches down the road.

skyline · September 29, 2012, 8:01pm

Thanks, gone with simplepie in the end!

rpkamp · September 29, 2012, 8:26pm

Yes, simplepie is pretty nice as well.

Even now when I think about it, dowloading a feed with an HTTP header claiming it’s UTF-8, an XML header claiming it’s ISO-8859-1 and then it turns out to be CP-1252, I still get the shivers a bit.

At one point I’d even written a function that kept ut8_decode’ing until it introduced extra question marks at which point it took the text from the last step as the final text. Brrr.

Parsing RSS sucks.