Writing a Feed Aggregator with Sinatra

Tweet

Who doesn’t like sunsets? Of all the sites you’ve seen about sunset you feel that there isn’t one that does it right. You decide to make the best damn sunset site on the web. There are a bunch of sunset photos. Could we use public images to populate this new awesome site? We’ll create a site while the lawyers figure out that last question.

Are the Kids Still Talking About APIs?

One might think “I bet Flickr has some great sunset photos.” Indeed they do. We’ll pick on this group. At the bottom of the page you’ll notice an RSS feed. Could we read that feed and display the images and give credit to the proper people?

Yes, yes we can.

Let’s use our friends Sinatra for our site and rack/test for testing.

Install Sinatra and rack/test

gem install sinatra
gem install rack-test

We will create a feed_aggregator and a test folder inside of it. While we are at it, create a blank main and test file too.

$ mkdir feed_aggregator 
$ mkdir feed_aggregator/test 
$ touch feed_aggregator/test/feed_aggregator_test.rb 

Maybe our first test is make sure the app starts up. In the test file let’s add

require '../main'  
require 'test/unit'  
require 'rack/test'  
 
ENV['RACK_ENV'] = 'test'
 
class FeedAggregatorTest < Test::Unit::TestCase
  include Rack::Test::Methods
 
  def app
    Sinatra::Application
  end
 
  def test_it_says_feed_aggregator
    get '/'
    assert last_response.ok?
    assert_equal 'Feed Aggregator', last_response.body
  end
 
end

Run the test.

test$ ruby feed_aggregator_test.rb 
<internal:lib/rubygems/custom_require>:29:in `require': no such file to load -- ../main (LoadError)
	from <internal:lib/rubygems/custom_require>:29:in `require'
	from feed_aggregator_test.rb:1:in `<main>'

Yeah! We failed the test. It says there is no main file. Go ahead and make make that file.

feed_aggregator $ touch main.rb
test$ ruby feed_aggregator_test.rb 
Loaded suite feed_aggregator_test
Started
E
Finished in 0.001054 seconds.
 
  1) Error:
test_it_says_feed_aggregator(FeedAggregatorTest):
NameError: uninitialized constant FeedAggregatorTest::Sinatra
    feed_aggregator_test.rb:11:in `app'
    /Users/johnivanoff/.rvm/gems/ruby-1.9.2-p180@feed_aggregator/gems/rack-test-0.6.2/lib/rack/test/methods.rb:31:in `build_rack_mock_session'
    /Users/johnivanoff/.rvm/gems/ruby-1.9.2-p180@feed_aggregator/gems/rack-test-0.6.2/lib/rack/test/methods.rb:27:in `rack_mock_session'
    /Users/johnivanoff/.rvm/gems/ruby-1.9.2-p180@feed_aggregator/gems/rack-test-0.6.2/lib/rack/test/methods.rb:42:in `build_rack_test_session'
    /Users/johnivanoff/.rvm/gems/ruby-1.9.2-p180@feed_aggregator/gems/rack-test-0.6.2/lib/rack/test/methods.rb:38:in `rack_test_session'
    /Users/johnivanoff/.rvm/gems/ruby-1.9.2-p180@feed_aggregator/gems/rack-test-0.6.2/lib/rack/test/methods.rb:46:in `current_session'
    feed_aggregator_test.rb:15:in `test_it_says_feed_aggregator'
 
1 tests, 0 assertions, 0 failures, 1 errors, 0 skips

Let’s add code for an app that just says “Feed Aggregator.”

require 'sinatra'
 
get '/' do
  "Feed Aggregator"
end

Do you think the test will pass now? Let’s find out.

test$ ruby feed_aggregator_test.rb 
Loaded suite feed_aggregator_test
Started
.
Finished in 0.063198 seconds.
 
1 tests, 2 assertions, 0 failures, 0 errors, 0 skips

The test does pass.

Now let’s go get that feed. Since we are developing by tests, it would be a great idea to have a fixture of the RSS feed. Go ahead and make a directory and blank xml file for them.

test$ mkdir fixtures 
test$ touch fixtures/feed.xml

Excellent. Let’s copy the source of the feed into the feed.xml file. If we test against the live feed, entries could change and make it very frustrating while testing. Who needs that?

<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	    xmlns:media="http://search.yahoo.com/mrss/"
	    xmlns:dc="http://purl.org/dc/elements/1.1/"
	    xmlns:creativeCommons="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html"
	    	    xmlns:flickr="urn:flickr:user" >
	<channel>
 
 
		<title>Flickr's Best Sunsets Pool</title>
		<link>http://www.flickr.com/groups/flickrsbestsunsets/pool/</link>
... Content elided ...

(Note: The entire XML file can be found in this gist.
WOW! Where do we start? Why don’t we look for links so that a person can click on the see the original photo on Flickr.

If you look at the XML you see that each picture is in an item section. The link tag will take us to the photo’s page. Based on that, we need to look into the feed, find the items, and find the link in that item.

That sounds like an awesome test. Write it up.

def test_find_the_link
  feed = File.read('fixtures/feed.xml')
  items = parse feed
  item = items.first
  link = 'http://www.flickr.com/photos/mattcaustin/8205498382/in/pool-1373979@N22'
  assert_equal item[:link], link
end

Where did I get the text for the link variable? I copied the link from the first item in the fixture. You will soon see how it is used.

Like I stated before, we will load and parse the fixture. The code will look in the first item to find the link node, take its text, compare it to our link variable. It should match.

Let’s run the test. I hope it fails.

test$ ruby feed_aggregator_test.rb 
Run options: 

# Running tests:

E.

Finished tests in 0.025705s, 77.8059 tests/s, 77.8059 assertions/s.

  1) Error:
test_find_the_link(FeedAggregatorTest):
NoMethodError: undefined method `parse' for #<FeedAggregatorTest:0x007ff7412ac390>
    feed_aggregator_test.rb:24:in `test_find_the_link'

2 tests, 2 assertions, 0 failures, 1 errors, 0 skips

No method for ‘parse’

Go ahead and create that method in the main.rb file.

require 'sinatra'

def parse  
end

get '/' do
  "Feed Aggregator"
end

Rerun test.

test$ ruby feed_aggregator_test.rb 
Run options: 

# Running tests:

E.

Finished tests in 0.251661s, 7.9472 tests/s, 7.9472 assertions/s.

  1) Error:
test_find_the_link(FeedAggregatorTest):
ArgumentError: wrong number of arguments (1 for 0)
    /Users/john/Dropbox/feed_aggregator/main.rb:4:in `parse'
    feed_aggregator_test.rb:23:in `test_find_the_link'

2 tests, 2 assertions, 0 failures, 1 errors, 0 skips

We didn’t pass in any arguments. That’s fine since we only wanted to solve the last error. What do you do to make this pass? Back in the main.rb file

require 'sinatra'

def parse feed
end

get '/' do
  "Feed Aggregator"
end

Do you think that will get rid of the arguments error? Rerun the test and see.

test$ ruby feed_aggregator_test.rb 
Run options: 

# Running tests:

E.

Finished tests in 0.027235s, 73.4349 tests/s, 73.4349 assertions/s.

  1) Error:
test_find_the_link(FeedAggregatorTest):
NoMethodError: undefined method `first' for nil:NilClass
    feed_aggregator_test.rb:24:in `test_find_the_link'

2 tests, 2 assertions, 0 failures, 1 errors, 0 skips

Indeed it did. Now we need a ‘first’ method.

In order to get to the first we will need to search through the XML in order to get the first link.

Now what’s a way to parse the XML to find that link? I like (Nokogiri)[http://nokogiri.org]. Do you have that gem installed? Let’s check. You’re output might look different than mine.

test$  gem list --local -d noko

*** LOCAL GEMS ***

Apparently I don’t. If you don’t, go ahead and install it.

test$ sudo gem install nokogiri

Other that seeing in the terminal that it successfully installed, how could you check to see if it’s installed?

test$  gem list --local -d noko

*** LOCAL GEMS ***

nokogiri (1.5.5)
    Authors: Aaron Patterson, Mike Dalessio, Yoko Harada, Tim Elliott
    Rubyforge: http://rubyforge.org/projects/nokogiri
    Homepage: http://nokogiri.org
    Installed at: /Users/john/.rvm/gems/ruby-1.9.3-p194@feed_aggregator

    Nokogiri (鋸) is an HTML, XML, SAX, and Reader parser

Where we’re we? Parsing the XML document. You need to load the feed into Nokogiri. Then you can go through each item and get the link and store them in a hash. Don’t forget to return the items.

require 'sinatra'
require 'nokogiri'

def parse feed
  doc = Nokogiri::XML feed
  doc.search('item').map do |doc_item|
    item = {}
    item[:link] = doc_item.at('link').text
    item
  end
end

get '/' do
  "Feed Aggregator"
end

Do you have a good feeling about this? Go ahead and run the test. Did you remember to include nokogiri in main.rb?

test$ ruby feed_aggregator_test.rb
Loaded suite feed_aggregator_test
Started
..
Finished in 0.112374 seconds.

2 tests, 3 assertions, 0 failures, 0 errors, 0 skips

Snoopy Dance Can we say that?

We need to get the thumbnail next. Where is that in the fixture?
Did you find it? It’s an attribute in the tag. Go ahead and write a test for that. It’s pretty close to the first one.

def test_find_the_thumbnail_image
  feed = File.read('fixtures/feed.xml')
  items = parse feed
  item = items.first
  thumbnail = 'http://farm9.staticflickr.com/8488/8205498382_4e5ed09a62_s.jpg'
  assert_equal item[:thumbnail], thumbnail
end

Run the test.

test$ ruby feed_aggregator_test.rb 
Run options: 

# Running tests:

.F..

Finished tests in 0.053503s, 74.7622 tests/s, 93.4527 assertions/s.

  1) Failure:
test_find_the_thumbnail_image(FeedAggregatorTest) [feed_aggregator_test.rb:42]:
<nil> expected but was
<"http://farm9.staticflickr.com/8488/8205498382_4e5ed09a62_s.jpg">.

4 tests, 5 assertions, 1 failures, 0 errors, 0 skips

That was expected. Let’s think this through. We need the value of an attribute of the media:thumbnail node. How about this?

require 'sinatra'
require 'nokogiri'

def parse feed
  doc = Nokogiri::XML feed
  doc.search('item').map do |doc_item|
    item = {}
    item[:link] = doc_item.at('link').text
    item[:thumbnail] = doc_item.at('media:thumbnail').attr('url').value
    item
  end
end

get '/' do
  "Feed Aggregator"
end

That makes sense since the url we need is an attribute of that node. Go ahead and try it.

test john$ ruby feed_aggregator_test.rb 
Run options: 

# Running tests:

EE.

Finished tests in 0.031776s, 94.4109 tests/s, 62.9406 assertions/s.

  1) Error:
test_find_the_link(FeedAggregatorTest):
NoMethodError: undefined method `attr' for nil:NilClass
    /Users/john/Dropbox/feed_aggregator/main.rb:10:in `block in parse'
    /Users/john/.rvm/gems/ruby-1.9.3-p194@feed_aggregator/gems/nokogiri-1.5.5/lib/nokogiri/xml/node_set.rb:239:in `block in each'
    /Users/john/.rvm/gems/ruby-1.9.3-p194@feed_aggregator/gems/nokogiri-1.5.5/lib/nokogiri/xml/node_set.rb:238:in `upto'
    /Users/john/.rvm/gems/ruby-1.9.3-p194@feed_aggregator/gems/nokogiri-1.5.5/lib/nokogiri/xml/node_set.rb:238:in `each'
    /Users/john/Dropbox/feed_aggregator/main.rb:7:in `map'
    /Users/john/Dropbox/feed_aggregator/main.rb:7:in `parse'
    feed_aggregator_test.rb:23:in `test_find_the_link'

  2) Error:
test_find_the_thumbnail_image(FeedAggregatorTest):
NoMethodError: undefined method `attr' for nil:NilClass
    /Users/john/Dropbox/feed_aggregator/main.rb:10:in `block in parse'
    /Users/john/.rvm/gems/ruby-1.9.3-p194@feed_aggregator/gems/nokogiri-1.5.5/lib/nokogiri/xml/node_set.rb:239:in `block in each'
    /Users/john/.rvm/gems/ruby-1.9.3-p194@feed_aggregator/gems/nokogiri-1.5.5/lib/nokogiri/xml/node_set.rb:238:in `upto'
    /Users/john/.rvm/gems/ruby-1.9.3-p194@feed_aggregator/gems/nokogiri-1.5.5/lib/nokogiri/xml/node_set.rb:238:in `each'
    /Users/john/Dropbox/feed_aggregator/main.rb:7:in `map'
    /Users/john/Dropbox/feed_aggregator/main.rb:7:in `parse'
    feed_aggregator_test.rb:31:in `test_find_the_thumbnail_image'

3 tests, 2 assertions, 0 failures, 2 errors, 0 skips

Well, that failed. undefined method ‘attr’ for nil:NilClass It’s not finding the <media:thumbnail> node. If you look at the top of the RSS feed you can see that the media namespace is used. More about namespaces.

Turns out, with Nokogiri, you can just uses the pipe symbol to indicate a namespace search. go ahead and swap out the colon and replace it with a pipe in the thumbnail line.

require 'sinatra'
require 'nokogiri'

def parse feed
  doc = Nokogiri::XML feed
  doc.search('item').map do |doc_item|
    item = {}
    item[:link] = doc_item.at('link').text
    item[:thumbnail] = doc_item.at('media|thumbnail').attr('url')
    item
  end
end

get '/' do
  "Feed Aggregator"
end

Give it try. rerun the tests

test$ ruby feed_aggregator_test.rb 
Run options: 

# Running tests:

...

Finished tests in 0.045266s, 66.2749 tests/s, 88.3665 assertions/s.

3 tests, 4 assertions, 0 failures, 0 errors, 0 skips

Awesome. We should probably use the title of the picture too. Go ahead and write the test for that. I’ll wait.

Finished? Here’s what I did.

def test_find_the_title
  feed = File.read('fixtures/feed.xml')
  items = parse feed
  item = items.first
  title = 'An Evening at Shell Beach'
  assert_equal item[:title], title
end

Again, we are using the title from the first item of our fixture. Run test.

test$ ruby feed_aggregator_test.rb 
Run options: 

# Running tests:

.F.

Finished tests in 0.078211s, 38.3578 tests/s, 51.1437 assertions/s.

  1) Failure:
test_find_the_title(FeedAggregatorTest) [feed_aggregator_test.rb:34]:
<nil> expected but was
<"An Evening at Shell Beach">.

3 tests, 4 assertions, 1 failures, 0 errors, 0 skips

We need to look for the title. How would you add this to the parse method?

require 'sinatra'
require 'nokogiri'

def parse feed
  doc = Nokogiri::XML feed
  doc.search('item').map do |doc_item|
    item = {}
    item[:link] = doc_item.at('link').text
    item[:thumbnail] = doc_item.at('media|thumbnail').attr('url')
    item[:title] = doc_item.at('title').text
    item
  end
end

get '/' do
  "Feed Aggregator"
end

Run the test.

test$ ruby feed_aggregator_test.rb 
Run options: 

# Running tests:

....

Finished tests in 0.062725s, 63.7704 tests/s, 79.7130 assertions/s.

4 tests, 5 assertions, 0 failures, 0 errors, 0 skips

Awesome, but let’s see something on a web page. We want results in the browser.

To keep things simple I’ll use erb for making the web page.

require 'sinatra'
require 'nokogiri'


def parse feed
  doc = Nokogiri::XML feed
  doc.search('item').map do |doc_item|
    item = {}
    item[:link] = doc_item.at('link').text
    item[:thumbnail] = doc_item.at('media|thumbnail').attr('url')
    item[:title] = doc_item.at('title').text
    item
  end
end

get '/' do
  erb :index
end

__END__

@@index
<!DOCTYPE html>
<html>
  <head>
  <meta charset="UTF-8">
  <meta name="viewport" content="user-scalable=yes, width=device-width" />
<title>Lovely Sunsets</title> 
</head>
<body>
  <h1>Feed Aggregator</h1>
</body>
</html>

Since we made some changes we should rerun the tests.

test$ ruby feed_aggregator_test.rb 
Run options: 

# Running tests:

...F

Finished tests in 0.251257s, 15.9200 tests/s, 19.8999 assertions/s.

  1) Failure:
test_it_says_feed_aggregator(FeedAggregatorTest) [feed_aggregator_test.rb:18]:
<"Feed Aggregator"> expected but was
<"<!DOCTYPE html>n<html>n  <head>n  <meta charset="UTF-8">n  <meta name="viewport" content="user-scalable=yes, width=device-width" />n<title>Lovely Sunsets</title> n</head>n<body>n  <h1>Feed Aggregator</h1>n</body>n</html>n">.

4 tests, 5 assertions, 1 failures, 0 errors, 0 skips

Oops. Go ahead and fix that.

def test_it_says_feed_aggregator
  get '/'
  assert last_response.ok?
  assert_match 'Feed Aggregator', last_response.body
end

We are making sure that ‘Feed Aggregator’ is within the page.

Now that tests are passing, let’s move on. You re-ran the test right? We’ll add the feed url and then parse out our info.

require 'sinatra'
require 'nokogiri'

feed = File.read('test/fixtures/feed.xml')

def parse feed
  doc = Nokogiri::XML feed
  doc.search('item').map do |doc_item|
    item = {}
    item[:link] = doc_item.at('link').text
    item[:thumbnail] = doc_item.at('media|thumbnail').attr('url')
    item[:title] = doc_item.at('title').text
    item
  end
end

get '/' do
  @pictures = parse feed
  erb :index
end

__END__

@@index
<!DOCTYPE html>
<html>
  <head>
  <meta charset="UTF-8">
  <meta name="viewport" content="user-scalable=yes, width=device-width" />
<title>Lovely Sunsets</title> 
</head>
<body>
  <h1>Feed Aggregator</h1>
  <dl>
    <% @pictures.each do |picture| %>
      <dt><a href="<%= picture[:link] %>"><%= picture[:title] %></a></dt>
      <dd><img src="<%= picture[:thumbnail] %>" /></dd>
    <% end %>
  </dl>
</body>
</html>

You might notice I am referring to the test fixture instead of RSS feed. Again I don’t want to constantly request from their server while developing.

Go ahead and start the server and check out the lovely work in your browser.

Everything is looking good. Let’s use real data. How would you wire this up to pull the feed from Flickr? Yep, let’s add open-uri to the main.rb file and then have nokogiri open the file.

require 'sinatra'
require 'nokogiri'
require 'open-uri'

feed = 'http://api.flickr.com/services/feeds/groups_pool.gne?id=1373979@N22&lang=en-us&format=rss_200'

def parse feed
  doc = Nokogiri::XML(open(feed))
  doc.search('item').map do |doc_item|
    item = {}
    item[:link] = doc_item.at('link').text
    item[:thumbnail] = doc_item.at('media|thumbnail').attr('url')
    item[:title] = doc_item.at('title').text
    item
  end
end

get '/' do
  @pictures = parse feed
  erb :index
end

__END__

@@index
<!DOCTYPE html>
<html>
  <head>
  <meta charset="UTF-8">
  <meta name="viewport" content="user-scalable=yes, width=device-width" />
<title>Lovely Sunsets</title> 
</head>
<body>
  <h1>Lovely Sunsets</h1>
  <dl>
    <% @pictures.each do |picture| %>
      <dt><a href="<%= picture[:link] %>"><%= picture[:title] %></a></dt>
      <dd><img src="<%= picture[:thumbnail] %>" /></dd>
    <% end %>
  </dl>
</body>
</html>

Fire it up and view it in the browser http://127.0.0.1:4567/

Sweet. Now you can add error handling, maybe some caching, or multiple feeds.

If you would like to see an article on one of these let us know.

Cheers.

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • hron84

    Guys, can anyone do anything with these codes? The whole article looks terrible on my machine… unreadable for me.

  • http://luizsanches.wordpress.com Luiz Sanches

    me too

    • http://www.ruprict.net/ Glenn Goodrich

      We’re having a shocker with one of our WordPress plugins. I have fixed this article now. Sorry about that and thanks for bringing it to our attention.

      • Iceman

        The code formatting is still broken in a couple of places.
        Interesting article, I’m loving Sinatra for simple web apps.

  • http://ugur.ozyilmazel.com Uğur Özyılmazel

    Thank you. Huge Sinatra fan here!

  • Steve Robillard

    I would love to see the follow up articles you mentioned, especially the one on error handling. I also really appreciated the use of TDD in this article and think all articles should include testing – especially in light of the article on reading source code.