SitePoint Sponsor

User Tag List

Results 1 to 12 of 12
  1. #1
    SitePoint Zealot
    Join Date
    Oct 2000
    Posts
    156
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Need something to pull RSS feeds and find articles

    My apologies if this is not in the right place but I'm new to this and not sure where it may fit in.

    I am working with an alliance of 32 sites who want the articles from their site to be pulled to a central site we share. I've found ways to do this for those who have an RSS feed but I was wondering if there was something which would go out to those sites who don't have a feed.

    Does such a program exist?

    Thanks for any leads in advance.

  2. #2
    He's No Good To Me Dead silver trophybronze trophy stymiee's Avatar
    Join Date
    Feb 2003
    Location
    Slave I
    Posts
    23,426
    Mentioned
    2 Post(s)
    Tagged
    1 Thread(s)
    You just need an RSS parser. Their are tons of them like Magpie in PHP.

  3. #3
    SitePoint Zealot
    Join Date
    Oct 2000
    Posts
    156
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thank you.

    But will this also grab articles from sites who don't have an RSS feed?

  4. #4
    He's No Good To Me Dead silver trophybronze trophy stymiee's Avatar
    Join Date
    Feb 2003
    Location
    Slave I
    Posts
    23,426
    Mentioned
    2 Post(s)
    Tagged
    1 Thread(s)
    No. You need a site scraper. They're usually custom written as each site is written differently. If the sites that don't have an RSS feed want to be in the alliance then they really should create an RSS feed.

    BTW, this won't help you guys very much as the search engines will filter out the content of the new sites as duplicate content. You pretty much won't get any search engine traffic.

  5. #5
    SitePoint Zealot
    Join Date
    Oct 2000
    Posts
    156
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The more I find out, the more I tend to agree that the RSS feed needs to be a requirement.

    Please clarify for me your comment on the duplicate content. Are you saying that if we pull a headline and article summary provided by an RSS feed to a central site that it will hurt the originating site as far as search engine traffic?

    Thanks again.

  6. #6
    He's No Good To Me Dead silver trophybronze trophy stymiee's Avatar
    Join Date
    Feb 2003
    Location
    Slave I
    Posts
    23,426
    Mentioned
    2 Post(s)
    Tagged
    1 Thread(s)
    If it is just a headline and summary you probably should be fine. But if you are reproducing the entire articles then you will run into duplicate content issues and those pages will simply be ignored by Google.

  7. #7
    SitePoint Zealot
    Join Date
    Oct 2000
    Posts
    156
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    You've been extremely helpful.

    Any direction for someone looking for a program which does both?

    Thanks.

  8. #8
    King of Paralysis by Analysis bronze trophy
    Join Date
    Jul 2004
    Location
    Ottawa, Canada
    Posts
    5,840
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by 12thManJeff View Post
    You've been extremely helpful.

    Any direction for someone looking for a program which does both?

    Thanks.
    If you're looking to scrape the non-rss ones you'll need a custom solution as every site is configured differently as mentioned previously.

    As far as the RSS ones go, there are lots of decent ones out there, a google search will bring up a bunch.

  9. #9
    SitePoint Zealot
    Join Date
    Oct 2000
    Posts
    156
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I didn't understand whether the scrape had to be custom because of the combined site they would be placed on or custom because each site it would scrape is different.

    What I may need is a RSS parser AND allows for manual article placement.

  10. #10
    King of Paralysis by Analysis bronze trophy
    Join Date
    Jul 2004
    Location
    Ottawa, Canada
    Posts
    5,840
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by 12thManJeff View Post
    I didn't understand whether the scrape had to be custom because of the combined site they would be placed on or custom because each site it would scrape is different.
    It would be because each site they scrape is different.

    What I may need is a RSS parser AND allows for manual article placement.
    Or tell all the sites in your network to create an RSS feed. If they're using a CMS it should be easy enough, if they're not using a CMS and they don't post a ton of content they could just manually create a RSS feed (it's not hard to put together manually, just a hassle if you update a lot).

  11. #11
    SitePoint Enthusiast
    Join Date
    Dec 2007
    Posts
    76
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    And if you scrape sites and the site you're scraping finds out (and they will) you could get sued. Don't scrape full articles - it's theft pure and simple.

  12. #12
    He's No Good To Me Dead silver trophybronze trophy stymiee's Avatar
    Join Date
    Feb 2003
    Location
    Slave I
    Posts
    23,426
    Mentioned
    2 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by Cynthiab View Post
    And if you scrape sites and the site you're scraping finds out (and they will) you could get sued. Don't scrape full articles - it's theft pure and simple.
    All of the people they want to scrape are in their network and already agreed to this.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •