SitePoint Sponsor

User Tag List

Results 1 to 11 of 11
  1. #1
    SitePoint Evangelist webchalkboard's Avatar
    Join Date
    Jan 2005
    Location
    Bristol, UK
    Posts
    494
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Developing a blog feed aggregator

    Hi,

    I have the task of developing a blog feed aggregator. We have a website where the members have their own personal blogs. I need to design a script that gets their blog entries and puts in our database every couple of hours.

    I've pretty much managed to do this and it works for basic wordpress style rss feeds, however what happens if the feed is in a different format? I had kind of assumed that RSS was RSS and one parser would suit all, but after doing a little testing this is not turning out to be the case.

    Can anyone tell me how many different type of RSS / XML feeds I need to account and code for?

    Is this the best way in your opinions of creating a feed aggregator site?

    Any thoughts or opinions appreciated, i'm not sure if i'm barking up the wrong tree completely with this one.

    Thanks,
    Tom
    Websites for Sale - Sell websites in a purpose built marketplace
    Then do some Shopping

  2. #2
    SitePoint Zealot
    Join Date
    Jan 2007
    Posts
    191
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Maybe instead of developing your own rss parser you can use a class that is already out there. It might deal with this problem:

    Try this one: http://pear.php.net/package/XML_Feed_Parser
    or
    http://pear.php.net/package/XML_RSS

  3. #3
    SitePoint Evangelist webchalkboard's Avatar
    Join Date
    Jan 2005
    Location
    Bristol, UK
    Posts
    494
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Brilliant well done, I thought there might be something handy like that out there already.

    Cheers,
    Tom
    Websites for Sale - Sell websites in a purpose built marketplace
    Then do some Shopping

  4. #4
    SitePoint Evangelist webchalkboard's Avatar
    Join Date
    Jan 2005
    Location
    Bristol, UK
    Posts
    494
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hmm well it doesn't seem to work! I've tried that XML_Feed_Parser on a couple of feeds and it just complains saying:

    PHP Warning: DOMDocument::loadXML(): Start tag expected, '<' not found in Entity, line: 1 in /usr/share/pear/XML/Feed/Parser.php on line 90
    PHP Fatal error: Uncaught XML_Feed_Parser_Exception: Invalid input: this is not valid XML in /var/www/cron/feedmonster2.php on line 18
    #0 /var/www/cron/feedmonster2.php(18): XML_Feed_Parser->__construct('rss.xml', true, false, true)
    #1 {main}
    thrown in /usr/share/pear/XML/Feed/Parser.php on line 101

    I've had a play around with http://pear.php.net/manual/en/package.xml.xml-rss.php

    And it looked very promising, but then didn't like this feed:

    www. affiliateprogramadvice. com/blog/rss.xml
    (I've messed up the URL to stop it being a link)

    I'm guessing it's because it's not in the normal format for RSS... But it's very annoying. Can anyone suggest a parser that will work with ALL blog RSS formats?

    I'm trying to create a blog feed aggregator.

    Thanks,
    Tom
    Websites for Sale - Sell websites in a purpose built marketplace
    Then do some Shopping

  5. #5
    SitePoint Zealot
    Join Date
    Jan 2007
    Posts
    191
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    sorry, i thought those would have/should have worked. I know there are some issues with certain rss feeds that are not standard, are homemade.

    It looks like the link www. affiliateprogramadvice. com/blog/rss.xml you gave is an Atom feed and not RSS. Do you need to explicitly tell your parser to parse Atom vs. RSS?

  6. #6
    SitePoint Zealot
    Join Date
    Jan 2007
    Posts
    191
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

  7. #7

  8. #8
    SitePoint Evangelist webchalkboard's Avatar
    Join Date
    Jan 2005
    Location
    Bristol, UK
    Posts
    494
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Cool thanks, will try out those leads now. At the moment i'm using some kind of custom built thing I got off a friend, but whatever approach I take it doesn't sem to be able to deal with the Atom feed. I should have thought it ought to auto detect it, after all I won't necessarily know what kind of feed it is before I parse it. I thought the parser would be able to recognise the file format and parse accordingly.

    I'll try those other suggestions.

    Thanks,
    Tom
    Websites for Sale - Sell websites in a purpose built marketplace
    Then do some Shopping

  9. #9
    SitePoint Zealot
    Join Date
    Jan 2007
    Posts
    191
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    In the header of the feed it will tell you what kind it is, and you just have to parse that out to detect if it is RSS or Atom, then send it to the right parser.

  10. #10
    SitePoint Guru
    Join Date
    Sep 2000
    Location
    USA
    Posts
    923
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    What did you end up going with?
    TheWeighWeWere.com - Weight Loss Success Stories from A to Z!

    oops, I did it again...re-relaunced July '07

  11. #11
    SitePoint Evangelist webchalkboard's Avatar
    Join Date
    Jan 2005
    Location
    Bristol, UK
    Posts
    494
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I used something called Simplepie http://simplepie.org/ Their website seems to be down at the moment but I expect it will come back online at some point.

    Very easy to use, just include the class, create an object then call the various functions. It also puts the data you have parsed in a much simpler format than the previous methods I've used.

    Tom
    Websites for Sale - Sell websites in a purpose built marketplace
    Then do some Shopping


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •