Writing a Feed Aggregator with Sinatra
Who doesn’t like sunsets? Of all the sites you’ve seen about sunset you feel that there isn’t one that does it right. You decide to make the best damn sunset site on the web. There are a bunch of sunset photos. Could we use public images to populate this new awesome site? We’ll create a site while the lawyers figure out that last question.
Are the Kids Still Talking About APIs?
One might think “I bet Flickr has some great sunset photos.” Indeed they do. We’ll pick on this group. At the bottom of the page you’ll notice an RSS feed. Could we read that feed and display the images and give credit to the proper people?
Yes, yes we can.
Let’s use our friends Sinatra for our site and rack/test for testing.
Install Sinatra and rack/test
gem install sinatra
gem install rack-test
We will create a feed_aggregator and a test folder inside of it. While we are at it, create a blank main and test file too.
$ mkdir feed_aggregator
$ mkdir feed_aggregator/test
$ touch feed_aggregator/test/feed_aggregator_test.rb
Maybe our first test is make sure the app starts up. In the test file let’s add
require '../main'
require 'test/unit'
require 'rack/test'
ENV['RACK_ENV'] = 'test'
class FeedAggregatorTest < Test::Unit::TestCase
include Rack::Test::Methods
def app
Sinatra::Application
end
def test_it_says_feed_aggregator
get '/'
assert last_response.ok?
assert_equal 'Feed Aggregator', last_response.body
end
end
Run the test.
test$ ruby feed_aggregator_test.rb
<internal:lib/rubygems/custom_require>:29:in `require': no such file to load -- ../main (LoadError)
from <internal:lib/rubygems/custom_require>:29:in `require'
from feed_aggregator_test.rb:1:in `<main>'
Yeah! We failed the test. It says there is no main file. Go ahead and make make that file.
feed_aggregator $ touch main.rb
test$ ruby feed_aggregator_test.rb
Loaded suite feed_aggregator_test
Started
E
Finished in 0.001054 seconds.
1) Error:
test_it_says_feed_aggregator(FeedAggregatorTest):
NameError: uninitialized constant FeedAggregatorTest::Sinatra
feed_aggregator_test.rb:11:in `app'
/Users/johnivanoff/.rvm/gems/ruby-1.9.2-p180@feed_aggregator/gems/rack-test-0.6.2/lib/rack/test/methods.rb:31:in `build_rack_mock_session'
/Users/johnivanoff/.rvm/gems/ruby-1.9.2-p180@feed_aggregator/gems/rack-test-0.6.2/lib/rack/test/methods.rb:27:in `rack_mock_session'
/Users/johnivanoff/.rvm/gems/ruby-1.9.2-p180@feed_aggregator/gems/rack-test-0.6.2/lib/rack/test/methods.rb:42:in `build_rack_test_session'
/Users/johnivanoff/.rvm/gems/ruby-1.9.2-p180@feed_aggregator/gems/rack-test-0.6.2/lib/rack/test/methods.rb:38:in `rack_test_session'
/Users/johnivanoff/.rvm/gems/ruby-1.9.2-p180@feed_aggregator/gems/rack-test-0.6.2/lib/rack/test/methods.rb:46:in `current_session'
feed_aggregator_test.rb:15:in `test_it_says_feed_aggregator'
1 tests, 0 assertions, 0 failures, 1 errors, 0 skips
Let’s add code for an app that just says “Feed Aggregator.”
require 'sinatra'
get '/' do
"Feed Aggregator"
end
Do you think the test will pass now? Let’s find out.
test$ ruby feed_aggregator_test.rb
Loaded suite feed_aggregator_test
Started
.
Finished in 0.063198 seconds.
1 tests, 2 assertions, 0 failures, 0 errors, 0 skips
The test does pass.
Now let’s go get that feed. Since we are developing by tests, it would be a great idea to have a fixture of the RSS feed. Go ahead and make a directory and blank xml file for them.
test$ mkdir fixtures
test$ touch fixtures/feed.xml
Excellent. Let’s copy the source of the feed into the feed.xml file. If we test against the live feed, entries could change and make it very frustrating while testing. Who needs that?
<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
xmlns:media="http://search.yahoo.com/mrss/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:creativeCommons="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html"
xmlns:flickr="urn:flickr:user" >
<channel>
<title>Flickr's Best Sunsets Pool</title>
<link>http://www.flickr.com/groups/flickrsbestsunsets/pool/</link>
... Content elided ...
(Note: The entire XML file can be found in this gist.
WOW! Where do we start? Why don’t we look for links so that a person can click on the see the original photo on Flickr.
If you look at the XML you see that each picture is in an item section. The link tag will take us to the photo’s page. Based on that, we need to look into the feed, find the items, and find the link in that item.
That sounds like an awesome test. Write it up.
def test_find_the_link
feed = File.read('fixtures/feed.xml')
items = parse feed
item = items.first
link = 'http://www.flickr.com/photos/mattcaustin/8205498382/in/pool-1373979@N22'
assert_equal item[:link], link
end
Where did I get the text for the link variable? I copied the link from the first item in the fixture. You will soon see how it is used.
Like I stated before, we will load and parse the fixture. The code will look in the first item to find the link node, take its text, compare it to our link variable. It should match.
Let’s run the test. I hope it fails.
test$ ruby feed_aggregator_test.rb
Run options:
# Running tests:
E.
Finished tests in 0.025705s, 77.8059 tests/s, 77.8059 assertions/s.
1) Error:
test_find_the_link(FeedAggregatorTest):
NoMethodError: undefined method `parse' for #<FeedAggregatorTest:0x007ff7412ac390>
feed_aggregator_test.rb:24:in `test_find_the_link'
2 tests, 2 assertions, 0 failures, 1 errors, 0 skips
No method for ‘parse’
Go ahead and create that method in the main.rb file.
require 'sinatra'
def parse
end
get '/' do
"Feed Aggregator"
end
Rerun test.
test$ ruby feed_aggregator_test.rb
Run options:
# Running tests:
E.
Finished tests in 0.251661s, 7.9472 tests/s, 7.9472 assertions/s.
1) Error:
test_find_the_link(FeedAggregatorTest):
ArgumentError: wrong number of arguments (1 for 0)
/Users/john/Dropbox/feed_aggregator/main.rb:4:in `parse'
feed_aggregator_test.rb:23:in `test_find_the_link'
2 tests, 2 assertions, 0 failures, 1 errors, 0 skips
We didn’t pass in any arguments. That’s fine since we only wanted to solve the last error. What do you do to make this pass? Back in the main.rb file
require 'sinatra'
def parse feed
end
get '/' do
"Feed Aggregator"
end
Do you think that will get rid of the arguments error? Rerun the test and see.
test$ ruby feed_aggregator_test.rb
Run options:
# Running tests:
E.
Finished tests in 0.027235s, 73.4349 tests/s, 73.4349 assertions/s.
1) Error:
test_find_the_link(FeedAggregatorTest):
NoMethodError: undefined method `first' for nil:NilClass
feed_aggregator_test.rb:24:in `test_find_the_link'
2 tests, 2 assertions, 0 failures, 1 errors, 0 skips
Indeed it did. Now we need a ‘first’ method.
In order to get to the first we will need to search through the XML in order to get the first link.
Now what’s a way to parse the XML to find that link? I like (Nokogiri)[http://nokogiri.org]. Do you have that gem installed? Let’s check. You’re output might look different than mine.
test$ gem list --local -d noko
*** LOCAL GEMS ***
Apparently I don’t. If you don’t, go ahead and install it.
test$ sudo gem install nokogiri
Other that seeing in the terminal that it successfully installed, how could you check to see if it’s installed?
test$ gem list --local -d noko
*** LOCAL GEMS ***
nokogiri (1.5.5)
Authors: Aaron Patterson, Mike Dalessio, Yoko Harada, Tim Elliott
Rubyforge: http://rubyforge.org/projects/nokogiri
Homepage: http://nokogiri.org
Installed at: /Users/john/.rvm/gems/ruby-1.9.3-p194@feed_aggregator
Nokogiri (鋸) is an HTML, XML, SAX, and Reader parser
Where we’re we? Parsing the XML document. You need to load the feed into Nokogiri. Then you can go through each item and get the link and store them in a hash. Don’t forget to return the items.
require 'sinatra'
require 'nokogiri'
def parse feed
doc = Nokogiri::XML feed
doc.search('item').map do |doc_item|
item = {}
item[:link] = doc_item.at('link').text
item
end
end
get '/' do
"Feed Aggregator"
end
Do you have a good feeling about this? Go ahead and run the test. Did you remember to include nokogiri in main.rb?
test$ ruby feed_aggregator_test.rb
Loaded suite feed_aggregator_test
Started
..
Finished in 0.112374 seconds.
2 tests, 3 assertions, 0 failures, 0 errors, 0 skips
Snoopy Dance Can we say that?
We need to get the thumbnail next. Where is that in the fixture?
Did you find it? It’s an attribute in the tag. Go ahead and write a test for that. It’s pretty close to the first one.
def test_find_the_thumbnail_image
feed = File.read('fixtures/feed.xml')
items = parse feed
item = items.first
thumbnail = 'http://farm9.staticflickr.com/8488/8205498382_4e5ed09a62_s.jpg'
assert_equal item[:thumbnail], thumbnail
end
Run the test.
test$ ruby feed_aggregator_test.rb
Run options:
# Running tests:
.F..
Finished tests in 0.053503s, 74.7622 tests/s, 93.4527 assertions/s.
1) Failure:
test_find_the_thumbnail_image(FeedAggregatorTest) [feed_aggregator_test.rb:42]:
<nil> expected but was
<"http://farm9.staticflickr.com/8488/8205498382_4e5ed09a62_s.jpg">.
4 tests, 5 assertions, 1 failures, 0 errors, 0 skips
That was expected. Let’s think this through. We need the value of an attribute of the media:thumbnail
node. How about this?
require 'sinatra'
require 'nokogiri'
def parse feed
doc = Nokogiri::XML feed
doc.search('item').map do |doc_item|
item = {}
item[:link] = doc_item.at('link').text
item[:thumbnail] = doc_item.at('media:thumbnail').attr('url').value
item
end
end
get '/' do
"Feed Aggregator"
end
That makes sense since the url we need is an attribute of that node. Go ahead and try it.
test john$ ruby feed_aggregator_test.rb
Run options:
# Running tests:
EE.
Finished tests in 0.031776s, 94.4109 tests/s, 62.9406 assertions/s.
1) Error:
test_find_the_link(FeedAggregatorTest):
NoMethodError: undefined method `attr' for nil:NilClass
/Users/john/Dropbox/feed_aggregator/main.rb:10:in `block in parse'
/Users/john/.rvm/gems/ruby-1.9.3-p194@feed_aggregator/gems/nokogiri-1.5.5/lib/nokogiri/xml/node_set.rb:239:in `block in each'
/Users/john/.rvm/gems/ruby-1.9.3-p194@feed_aggregator/gems/nokogiri-1.5.5/lib/nokogiri/xml/node_set.rb:238:in `upto'
/Users/john/.rvm/gems/ruby-1.9.3-p194@feed_aggregator/gems/nokogiri-1.5.5/lib/nokogiri/xml/node_set.rb:238:in `each'
/Users/john/Dropbox/feed_aggregator/main.rb:7:in `map'
/Users/john/Dropbox/feed_aggregator/main.rb:7:in `parse'
feed_aggregator_test.rb:23:in `test_find_the_link'
2) Error:
test_find_the_thumbnail_image(FeedAggregatorTest):
NoMethodError: undefined method `attr' for nil:NilClass
/Users/john/Dropbox/feed_aggregator/main.rb:10:in `block in parse'
/Users/john/.rvm/gems/ruby-1.9.3-p194@feed_aggregator/gems/nokogiri-1.5.5/lib/nokogiri/xml/node_set.rb:239:in `block in each'
/Users/john/.rvm/gems/ruby-1.9.3-p194@feed_aggregator/gems/nokogiri-1.5.5/lib/nokogiri/xml/node_set.rb:238:in `upto'
/Users/john/.rvm/gems/ruby-1.9.3-p194@feed_aggregator/gems/nokogiri-1.5.5/lib/nokogiri/xml/node_set.rb:238:in `each'
/Users/john/Dropbox/feed_aggregator/main.rb:7:in `map'
/Users/john/Dropbox/feed_aggregator/main.rb:7:in `parse'
feed_aggregator_test.rb:31:in `test_find_the_thumbnail_image'
3 tests, 2 assertions, 0 failures, 2 errors, 0 skips
Well, that failed. undefined method ‘attr’ for nil:NilClass It’s not finding the <media:thumbnail>
node. If you look at the top of the RSS feed you can see that the media namespace is used. More about namespaces.
Turns out, with Nokogiri, you can just uses the pipe symbol to indicate a namespace search. go ahead and swap out the colon and replace it with a pipe in the thumbnail line.
require 'sinatra'
require 'nokogiri'
def parse feed
doc = Nokogiri::XML feed
doc.search('item').map do |doc_item|
item = {}
item[:link] = doc_item.at('link').text
item[:thumbnail] = doc_item.at('media|thumbnail').attr('url')
item
end
end
get '/' do
"Feed Aggregator"
end
Give it try. rerun the tests
test$ ruby feed_aggregator_test.rb
Run options:
# Running tests:
...
Finished tests in 0.045266s, 66.2749 tests/s, 88.3665 assertions/s.
3 tests, 4 assertions, 0 failures, 0 errors, 0 skips
Awesome. We should probably use the title of the picture too. Go ahead and write the test for that. I’ll wait.
Finished? Here’s what I did.
def test_find_the_title
feed = File.read('fixtures/feed.xml')
items = parse feed
item = items.first
title = 'An Evening at Shell Beach'
assert_equal item[:title], title
end
Again, we are using the title from the first item of our fixture. Run test.
test$ ruby feed_aggregator_test.rb
Run options:
# Running tests:
.F.
Finished tests in 0.078211s, 38.3578 tests/s, 51.1437 assertions/s.
1) Failure:
test_find_the_title(FeedAggregatorTest) [feed_aggregator_test.rb:34]:
<nil> expected but was
<"An Evening at Shell Beach">.
3 tests, 4 assertions, 1 failures, 0 errors, 0 skips
We need to look for the title. How would you add this to the parse method?
require 'sinatra'
require 'nokogiri'
def parse feed
doc = Nokogiri::XML feed
doc.search('item').map do |doc_item|
item = {}
item[:link] = doc_item.at('link').text
item[:thumbnail] = doc_item.at('media|thumbnail').attr('url')
item[:title] = doc_item.at('title').text
item
end
end
get '/' do
"Feed Aggregator"
end
Run the test.
test$ ruby feed_aggregator_test.rb
Run options:
# Running tests:
....
Finished tests in 0.062725s, 63.7704 tests/s, 79.7130 assertions/s.
4 tests, 5 assertions, 0 failures, 0 errors, 0 skips
Awesome, but let’s see something on a web page. We want results in the browser.
To keep things simple I’ll use erb for making the web page.
require 'sinatra'
require 'nokogiri'
def parse feed
doc = Nokogiri::XML feed
doc.search('item').map do |doc_item|
item = {}
item[:link] = doc_item.at('link').text
item[:thumbnail] = doc_item.at('media|thumbnail').attr('url')
item[:title] = doc_item.at('title').text
item
end
end
get '/' do
erb :index
end
__END__
@@index
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<meta name="viewport" content="user-scalable=yes, width=device-width" />
<title>Lovely Sunsets</title>
</head>
<body>
<h1>Feed Aggregator</h1>
</body>
</html>
Since we made some changes we should rerun the tests.
test$ ruby feed_aggregator_test.rb
Run options:
# Running tests:
...F
Finished tests in 0.251257s, 15.9200 tests/s, 19.8999 assertions/s.
1) Failure:
test_it_says_feed_aggregator(FeedAggregatorTest) [feed_aggregator_test.rb:18]:
<"Feed Aggregator"> expected but was
<"<!DOCTYPE html>n<html>n <head>n <meta charset="UTF-8">n <meta name="viewport" content="user-scalable=yes, width=device-width" />n<title>Lovely Sunsets</title> n</head>n<body>n <h1>Feed Aggregator</h1>n</body>n</html>n">.
4 tests, 5 assertions, 1 failures, 0 errors, 0 skips
Oops. Go ahead and fix that.
def test_it_says_feed_aggregator
get '/'
assert last_response.ok?
assert_match 'Feed Aggregator', last_response.body
end
We are making sure that ‘Feed Aggregator’ is within the page.
Now that tests are passing, let’s move on. You re-ran the test right? We’ll add the feed url and then parse out our info.
require 'sinatra'
require 'nokogiri'
feed = File.read('test/fixtures/feed.xml')
def parse feed
doc = Nokogiri::XML feed
doc.search('item').map do |doc_item|
item = {}
item[:link] = doc_item.at('link').text
item[:thumbnail] = doc_item.at('media|thumbnail').attr('url')
item[:title] = doc_item.at('title').text
item
end
end
get '/' do
@pictures = parse feed
erb :index
end
__END__
@@index
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<meta name="viewport" content="user-scalable=yes, width=device-width" />
<title>Lovely Sunsets</title>
</head>
<body>
<h1>Feed Aggregator</h1>
<dl>
<% @pictures.each do |picture| %>
<dt><a href="<%= picture[:link] %>"><%= picture[:title] %></a></dt>
<dd><img src="<%= picture[:thumbnail] %>" /></dd>
<% end %>
</dl>
</body>
</html>
You might notice I am referring to the test fixture instead of RSS feed. Again I don’t want to constantly request from their server while developing.
Go ahead and start the server and check out the lovely work in your browser.
Everything is looking good. Let’s use real data. How would you wire this up to pull the feed from Flickr? Yep, let’s add open-uri to the main.rb file and then have nokogiri open the file.
require 'sinatra'
require 'nokogiri'
require 'open-uri'
feed = 'http://api.flickr.com/services/feeds/groups_pool.gne?id=1373979@N22&lang=en-us&format=rss_200'
def parse feed
doc = Nokogiri::XML(open(feed))
doc.search('item').map do |doc_item|
item = {}
item[:link] = doc_item.at('link').text
item[:thumbnail] = doc_item.at('media|thumbnail').attr('url')
item[:title] = doc_item.at('title').text
item
end
end
get '/' do
@pictures = parse feed
erb :index
end
__END__
@@index
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<meta name="viewport" content="user-scalable=yes, width=device-width" />
<title>Lovely Sunsets</title>
</head>
<body>
<h1>Lovely Sunsets</h1>
<dl>
<% @pictures.each do |picture| %>
<dt><a href="<%= picture[:link] %>"><%= picture[:title] %></a></dt>
<dd><img src="<%= picture[:thumbnail] %>" /></dd>
<% end %>
</dl>
</body>
</html>
Fire it up and view it in the browser http://127.0.0.1:4567/
Sweet. Now you can add error handling, maybe some caching, or multiple feeds.
If you would like to see an article on one of these let us know.
Cheers.