Start Your SEO Right with Sitemaps on Rails
After crafting your website, the next step usually involves taking care of search engine optimization (SEO). With that in mind, creating a sitemap is one of the tasks that you will need to solve. According to the protocol, sitemaps are UTF-8 encoded XML files that describe the structure of your site. They are quite simple, but for large sites creating them by hand is not an option. Therefore, it’s a smart move to automate generating sitemaps.
There are a number of solutions for Rails to generate sitemaps available, but I prefer a gem called sitemap_generator. It is being actively maintained and has a number of cool features:
- It is framework-agnostic, so you may use it without Rails
- It is very flexible
- It has own configuration file and is not strictly bound to your app’s routes
- It allows you to automatically upload sitemaps to third-party storage
- It automatically pings search engines when a new sitemap is generated
- It supports multiple sitemap files and various types of sitemaps (video, news, images, etc.)
In this article we will see SitemapGenerator in action by integrating it into a sample Rails app and discussing its main features. I will also explain how to export sitemaps to cloud storage so that everything works properly on platforms like Heroku.
The source code for this article is available at GitHub.
Creating a Sample Site
As usual, start off by creating a new Rails application:
$ rails new Sitemapper -T
I will be using Rails 5.0.1 but SitemapGenerator works with virtually any version.
We will need some sample models, routes, and controllers. Views can be omitted for this demo – it does not really matter what content the site actually has.
Suppose we are creating a blog that has posts and categories; one category can have many posts. Run the following commands to generate models and migrations:
$ rails g model Category title:string
$ rails g model Post category:belongs_to title:string body:text
$ rails db:migrate
Make sure that models have the proper associations set up:
models/category.rb
[...]
has_many :posts, dependent: :destroy
[...]
models/post.rb
[...]
belongs_to :category
[...]
Now let’s set up some routes. To make things a bit more interesting, I will make them nested:
config/routes.rb
[...]
resources :categories do
resources :posts
end
[...]
Also, while we are here, add the root route:
config/routes.rb
[...]
root to: 'pages#index'
[...]
Now create the controllers. We don’t really need any actions inside, so they will be very simple:
categories_controller.rb
class CategoriesController < ApplicationController
end
posts_controller.rb
class PostsController < ApplicationController
end
pages_controller.rb
class PagesController < ApplicationController
end
Great! Before proceeding, however, let’s also take care of sample data inside our application.
Loading Sample Data
To see SitemapGenerator in action we will also require some sample data. I am going to use the Faker gem for this task:
Gemfile
[...]
group :development do
gem 'faker'
end
[...]
Install it:
$ bundle install
Now modify the db/seeds.rb file:
db/seeds.rb
5.times do
category = Category.create({
title: Faker::Book.title
})
5.times do
category.posts.create({
title: Faker::Book.title,
body: Faker::Lorem.sentence
})
end
end
We are creating five categories each with five posts that have some random content. To run this script use the following command:
$ rails db:seed
Nice! Preparations are done, so let’s proceed to the main part.
Integrating Sitemap Generator
Add the gem we’re using into the Gemfile:
Gemfile
[...]
gem 'sitemap_generator'
[...]
Install it:
$ bundle install
To create a sample config file with useful comments, employ this command:
$ rake sitemap:install
Inside the config directory you will find a sitemap.rb file. The first thing to do here is specify the hostname of your site:
config/sitemap.rb
SitemapGenerator::Sitemap.default_host = "http://www.example.com"
Note that this gem also supports multiple host names.
The main instructions for SitemapGenerator should be placed inside the block passed to the SitemapGenerator::Sitemap.create
method. For example, let’s add a link to our root path:
config/sitemap.rb
SitemapGenerator::Sitemap.create do
add root_path
end
The add
method accepts a bunch of arguments. Specify that the root page is being updated daily:
config/sitemap.rb
add root_path, :changefreq => 'daily'
What about the posts and categories? They are being added by the users dynamically so we must query the database and generate links on the fly:
config/sitemap.rb
[...]
Category.find_each do |category|
add category_posts_path(category), :changefreq => 'weekly', :lastmod => category.updated_at
category.posts.each do |post|
add category_post_path(category), :changefreq => 'yearly', :lastmod => post.updated_at
end
end
[...]
Note that here I’ve also provided the :lastmod
option to specify when the page was last updated (the default value is Time.now
).
Running Generator and Inspecting Sitemap Files
To generate a new sitemap (or update an existing one) run the following command:
$ rails sitemap:refresh
Note that if, for some reason, a sitemap fails to be generated, the old version won’t be removed. Another important thing to remember is that the script will automatically ping Google and Bing search engines to notify that a new version of a sitemap is available. Here is the sample output from the command above:
+ sitemap.xml.gz 1 sitemap / 251 Bytes
Sitemap stats: 62 links / 1 sitemap / 0m01s
Pinging with URL 'http://www.example.com/sitemap.xml.gz':
Successful ping of Google
Successful ping of Bing
If you need to ping additional engines, you may modify the SitemapGenerator::Sitemap.search_engines
hash. Also you may omit pinging of search engines by saying
$ rails sitemap:refresh:no_ping
Generated sitemaps will be placed inside the public directory with the .xml.gz extension. You may extract this file and browse it with any text editor. If for some reason you don’t want files to be compressed with GZip, set the SitemapGenerator::Sitemap.compress
option to false
.
Now that you have a sitemap in place, the public/robots.txt file should be modified to provide a link to it:
public/robots.txt
Sitemap: http://www.example.com/sitemap.xml.gz
SitemapGenerator may create an index file depending on how many links your sitemap has. By default (the :auto
option) if there are more than 50 000 links, they will be separated into different files and links to them will be added into the index. You can control this behavior by changing the SitemapGenerator::Sitemap.create_index
option. Other available options are true
(always generate index) and false
(never generate index).
If you wish to add a link directly into the index file, use the add_to_index
method that is very similar to the add
method.
Multiple Locales
Now suppose our blog supports two languages: English and Russian. Set English as the default locale and also tweak the available_locales
setting:
config/application.rb
[...]
config.i18n.default_locale = :en
config.i18n.available_locales = [:en, :ru]
[...]
Now scope the routes:
config/routes.rb
[...]
scope "(:locale)", locale: /#{I18n.available_locales.join("|")}/ do
resources :categories do
resources :posts
end
root to: 'pages#index'
end
[...]
It is probably a good idea to separate sitemaps for English and Russian locales into different files. This is totally possible, as SitemapGenerator supports groups:
config/sitemap.rb
[...]
{en: :english, ru: :russian}.each_pair do |locale, name|
group(:sitemaps_path => "sitemaps/#{locale}/", :filename => name) do
add root_path(locale: locale), :changefreq => 'daily'
Category.find_each do |category|
add category_posts_path(category, locale: locale), :changefreq => 'weekly', :lastmod => category.updated_at
category.posts.each do |post|
add category_post_path(category, post, locale: locale), :changefreq => 'yearly', :lastmod => post.updated_at
end
end
end
end
[...]
The idea is very simple. We are creating a public/sitemaps directory that contains ru and en folders. Inside there are english.xml.gz and russian.xml.gz files. I will also instruct the script to always generate the index file:
config/sitemap.rb
[...]
SitemapGenerator::Sitemap.create_index = true
[...]
Deploying to Heroku
Our site is ready for deployment, however, there is a problem: Heroku does not allow us to persist custom files. Therefore we must export the generated sitemap to cloud storage. I will use Amazon S3 for this demo, so add a new gem into the Gemfile:
Gemfile
[...]
gem 'fog-aws'
[...]
Install it:
$ bundle install
Now we need to provide a special configuration for SitemapGenerator explaining where to export the files:
config/sitemap.rb
[...]
SitemapGenerator::Sitemap.adapter = SitemapGenerator::S3Adapter.new(fog_provider: 'AWS',
aws_access_key_id: 'KEY',
aws_secret_access_key: 'SECRET',
fog_directory: 'DIR',
fog_region: 'REGION')
SitemapGenerator::Sitemap.public_path = 'tmp/'
SitemapGenerator::Sitemap.sitemaps_host = "https://example.s3.amazonaws.com/"
SitemapGenerator::Sitemap.sitemaps_path = 'sitemaps/'
[...]
SitemapGenerator::S3Adapter.new
contains configuration for S3. To obtain a key pair, you need to log into aws.amazon.com and create an account with read/write permission to access the S3 service. Do not publicly expose this key pair! Also create an S3 bucket in a chosen region (default is us-east-1
).
Next, we are setting tmp/
for the public_path
option – that’s the directory where the file will be initially created before being exported to S3.
sitemaps_host
should contain a path to your S3 bucket.
sitemaps_path
is a relative path inside your bucket.
Some more information about this configuration can be found on this page.
Another problem is that some platforms (Bing, for example) require sitemaps to be located under the same domain, therefore we need to take care of it as well. Let’s add a route /sitemap
to our application that will simply perform redirect to S3:
config/routes.rb
[...]
get '/sitemap', to: 'pages#sitemap'
[...]
The corresponding action:
pages_controller.rb
[...]
def sitemap
redirect_to 'https://example.s3.amazonaws.com/sitemaps/sitemap.xml.gz'
end
[...]
As you remember, by default SitemapGenerator will ping search engines but it will provide a direct link to S3 which is not what we want. Utilize the ping_search_engines
method to override this behavior:
config/sitemap.rb
[...]
SitemapGenerator::Sitemap.ping_search_engines('http://example.com/sitemap')
[...]
Do note that now you need to generate sitemap by running
$ rake sitemap:refresh:no_ping
because otherwise SitemapGenerator will ping search engines with both the direct link and http://example.com/sitemap
.
Lastly, update the robots.txt with a new link:
public/robots.txt
Sitemap: http://www.example.com/sitemap
This is it, now your site is ready to be published to Heroku!
Conclusion
We’ve reached the end of this article! By now you should be familiar with SitemapGenerator’s key features and be able to integrate it into your own application. If you have any questions, don’t hesitate to post them into the comments. Also, browse the gem’s documentation, as it has a number of other features that we haven’t discussed.
Thanks for staying with me and see you soon!