After crafting your website, the next step usually involves taking care of search engine optimization (SEO). With that in mind, creating a sitemap is one of the tasks that you will need to solve. According to the protocol, sitemaps are UTF-8 encoded XML files that describe the structure of your site. They are quite simple, but for large sites creating them by hand is not an option. Therefore, it’s a smart move to automate generating sitemaps.
There are a number of solutions for Rails to generate sitemaps available, but I prefer a gem called sitemap_generator. It is being actively maintained and has a number of cool features:
- It is framework-agnostic, so you may use it without Rails
- It is very flexible
- It has own configuration file and is not strictly bound to your app’s routes
- It allows you to automatically upload sitemaps to third-party storage
- It automatically pings search engines when a new sitemap is generated
- It supports multiple sitemap files and various types of sitemaps (video, news, images, etc.)
In this article we will see SitemapGenerator in action by integrating it into a sample Rails app and discussing its main features. I will also explain how to export sitemaps to cloud storage so that everything works properly on platforms like Heroku.
The source code for this article is available at GitHub.
Creating a Sample Site
As usual, start off by creating a new Rails application:
$ rails new Sitemapper -T
I will be using Rails 5.0.1 but SitemapGenerator works with virtually any version.
We will need some sample models, routes, and controllers. Views can be omitted for this demo – it does not really matter what content the site actually has.
Suppose we are creating a blog that has posts and categories; one category can have many posts. Run the following commands to generate models and migrations:
$ rails g model Category title:string
$ rails g model Post category:belongs_to title:string body:text
$ rails db:migrate
Make sure that models have the proper associations set up:
models/category.rb
[...]
has_many :posts, dependent: :destroy
[...]
models/post.rb
[...]
belongs_to :category
[...]
Now let’s set up some routes. To make things a bit more interesting, I will make them nested:
config/routes.rb
[...]
resources :categories do
resources :posts
end
[...]
Also, while we are here, add the root route:
config/routes.rb
[...]
root to: 'pages#index'
[...]
Now create the controllers. We don’t really need any actions inside, so they will be very simple:
categories_controller.rb
class CategoriesController < ApplicationController
end
posts_controller.rb
class PostsController < ApplicationController
end
pages_controller.rb
class PagesController < ApplicationController
end
Great! Before proceeding, however, let’s also take care of sample data inside our application.
Loading Sample Data
To see SitemapGenerator in action we will also require some sample data. I am going to use the Faker gem for this task:
Gemfile
[...]
group :development do
gem 'faker'
end
[...]
Install it:
$ bundle install
Now modify the db/seeds.rb file:
db/seeds.rb
5.times do
category = Category.create({
title: Faker::Book.title
})
5.times do
category.posts.create({
title: Faker::Book.title,
body: Faker::Lorem.sentence
})
end
end
We are creating five categories each with five posts that have some random content. To run this script use the following command:
$ rails db:seed
Nice! Preparations are done, so let’s proceed to the main part.
Integrating Sitemap Generator
Add the gem we’re using into the Gemfile:
Gemfile
[...]
gem 'sitemap_generator'
[...]
Install it:
$ bundle install
To create a sample config file with useful comments, employ this command:
$ rake sitemap:install
Inside the config directory you will find a sitemap.rb file. The first thing to do here is specify the hostname of your site:
config/sitemap.rb
SitemapGenerator::Sitemap.default_host = "http://www.example.com"
Note that this gem also supports multiple host names.
The main instructions for SitemapGenerator should be placed inside the block passed to the SitemapGenerator::Sitemap.create
method. For example, let’s add a link to our root path:
config/sitemap.rb
SitemapGenerator::Sitemap.create do
add root_path
end
The add
method accepts a bunch of arguments. Specify that the root page is being updated daily:
config/sitemap.rb
add root_path, :changefreq => 'daily'
What about the posts and categories? They are being added by the users dynamically so we must query the database and generate links on the fly:
config/sitemap.rb
[...]
Category.find_each do |category|
add category_posts_path(category), :changefreq => 'weekly', :lastmod => category.updated_at
category.posts.each do |post|
add category_post_path(category), :changefreq => 'yearly', :lastmod => post.updated_at
end
end
[...]
Note that here I’ve also provided the :lastmod
option to specify when the page was last updated (the default value is Time.now
).
Running Generator and Inspecting Sitemap Files
To generate a new sitemap (or update an existing one) run the following command:
$ rails sitemap:refresh
Note that if, for some reason, a sitemap fails to be generated, the old version won’t be removed. Another important thing to remember is that the script will automatically ping Google and Bing search engines to notify that a new version of a sitemap is available. Here is the sample output from the command above:
+ sitemap.xml.gz 1 sitemap / 251 Bytes
Sitemap stats: 62 links / 1 sitemap / 0m01s
Pinging with URL 'http://www.example.com/sitemap.xml.gz':
Successful ping of Google
Successful ping of Bing
If you need to ping additional engines, you may modify the SitemapGenerator::Sitemap.search_engines
hash. Also you may omit pinging of search engines by saying
$ rails sitemap:refresh:no_ping
Generated sitemaps will be placed inside the public directory with the .xml.gz extension. You may extract this file and browse it with any text editor. If for some reason you don’t want files to be compressed with GZip, set the SitemapGenerator::Sitemap.compress
option to false
.
Now that you have a sitemap in place, the public/robots.txt file should be modified to provide a link to it:
public/robots.txt
Sitemap: http://www.example.com/sitemap.xml.gz
SitemapGenerator may create an index file depending on how many links your sitemap has. By default (the :auto
option) if there are more than 50 000 links, they will be separated into different files and links to them will be added into the index. You can control this behavior by changing the SitemapGenerator::Sitemap.create_index
option. Other available options are true
(always generate index) and false
(never generate index).
If you wish to add a link directly into the index file, use the add_to_index
method that is very similar to the add
method.
Multiple Locales
Now suppose our blog supports two languages: English and Russian. Set English as the default locale and also tweak the available_locales
setting:
config/application.rb
[...]
config.i18n.default_locale = :en
config.i18n.available_locales = [:en, :ru]
[...]
Now scope the routes:
config/routes.rb
[...]
scope "(:locale)", locale: /#{I18n.available_locales.join("|")}/ do
resources :categories do
resources :posts
end
root to: 'pages#index'
end
[...]
It is probably a good idea to separate sitemaps for English and Russian locales into different files. This is totally possible, as SitemapGenerator supports groups:
config/sitemap.rb
[...]
{en: :english, ru: :russian}.each_pair do |locale, name|
group(:sitemaps_path => "sitemaps/#{locale}/", :filename => name) do
add root_path(locale: locale), :changefreq => 'daily'
Category.find_each do |category|
add category_posts_path(category, locale: locale), :changefreq => 'weekly', :lastmod => category.updated_at
category.posts.each do |post|
add category_post_path(category, post, locale: locale), :changefreq => 'yearly', :lastmod => post.updated_at
end
end
end
end
[...]
The idea is very simple. We are creating a public/sitemaps directory that contains ru and en folders. Inside there are english.xml.gz and russian.xml.gz files. I will also instruct the script to always generate the index file:
config/sitemap.rb
[...]
SitemapGenerator::Sitemap.create_index = true
[...]
Deploying to Heroku
Our site is ready for deployment, however, there is a problem: Heroku does not allow us to persist custom files. Therefore we must export the generated sitemap to cloud storage. I will use Amazon S3 for this demo, so add a new gem into the Gemfile:
Gemfile
[...]
gem 'fog-aws'
[...]
Install it:
$ bundle install
Now we need to provide a special configuration for SitemapGenerator explaining where to export the files:
config/sitemap.rb
[...]
SitemapGenerator::Sitemap.adapter = SitemapGenerator::S3Adapter.new(fog_provider: 'AWS',
aws_access_key_id: 'KEY',
aws_secret_access_key: 'SECRET',
fog_directory: 'DIR',
fog_region: 'REGION')
SitemapGenerator::Sitemap.public_path = 'tmp/'
SitemapGenerator::Sitemap.sitemaps_host = "https://example.s3.amazonaws.com/"
SitemapGenerator::Sitemap.sitemaps_path = 'sitemaps/'
[...]
SitemapGenerator::S3Adapter.new
contains configuration for S3. To obtain a key pair, you need to log into aws.amazon.com and create an account with read/write permission to access the S3 service. Do not publicly expose this key pair! Also create an S3 bucket in a chosen region (default is us-east-1
).
Next, we are setting tmp/
for the public_path
option – that’s the directory where the file will be initially created before being exported to S3.
sitemaps_host
should contain a path to your S3 bucket.
sitemaps_path
is a relative path inside your bucket.
Some more information about this configuration can be found on this page.
Another problem is that some platforms (Bing, for example) require sitemaps to be located under the same domain, therefore we need to take care of it as well. Let’s add a route /sitemap
to our application that will simply perform redirect to S3:
config/routes.rb
[...]
get '/sitemap', to: 'pages#sitemap'
[...]
The corresponding action:
pages_controller.rb
[...]
def sitemap
redirect_to 'https://example.s3.amazonaws.com/sitemaps/sitemap.xml.gz'
end
[...]
As you remember, by default SitemapGenerator will ping search engines but it will provide a direct link to S3 which is not what we want. Utilize the ping_search_engines
method to override this behavior:
config/sitemap.rb
[...]
SitemapGenerator::Sitemap.ping_search_engines('http://example.com/sitemap')
[...]
Do note that now you need to generate sitemap by running
$ rake sitemap:refresh:no_ping
because otherwise SitemapGenerator will ping search engines with both the direct link and http://example.com/sitemap
.
Lastly, update the robots.txt with a new link:
public/robots.txt
Sitemap: http://www.example.com/sitemap
This is it, now your site is ready to be published to Heroku!
Conclusion
We’ve reached the end of this article! By now you should be familiar with SitemapGenerator’s key features and be able to integrate it into your own application. If you have any questions, don’t hesitate to post them into the comments. Also, browse the gem’s documentation, as it has a number of other features that we haven’t discussed.
Thanks for staying with me and see you soon!
Frequently Asked Questions (FAQs) about SEO and Sitemaps on Rails
How do I generate a sitemap for my Rails application?
Generating a sitemap for your Rails application involves several steps. First, you need to add the ‘sitemap_generator’ gem to your Gemfile and run the ‘bundle install’ command. Next, you need to create a configuration file for the sitemap generator. This file will specify the pages you want to include in your sitemap. You can then run the ‘rake sitemap:refresh’ command to generate the sitemap. The sitemap will be created in the public directory of your Rails application.
How can I automate the process of sitemap generation?
Automating the process of sitemap generation can save you a lot of time and effort. You can use the ‘whenever’ gem to schedule the ‘rake sitemap:refresh’ task to run at regular intervals. This will ensure that your sitemap is always up-to-date, even if you forget to manually refresh it.
How do I deploy my sitemap to Heroku?
Deploying your sitemap to Heroku involves a few extra steps. First, you need to configure your sitemap generator to store the sitemap in a public directory. Next, you need to add the ‘aws-sdk-s3’ gem to your Gemfile and configure it to upload your sitemap to an S3 bucket. Finally, you need to set up a rake task to refresh the sitemap and upload it to S3 whenever you deploy your application to Heroku.
How can I ensure that search engines find my sitemap?
To ensure that search engines find your sitemap, you need to add a reference to it in your robots.txt file. This file tells search engines where to find your sitemap. You can also submit your sitemap directly to search engines like Google and Bing through their webmaster tools.
What should I include in my sitemap?
Your sitemap should include all the pages on your website that you want search engines to index. This typically includes all your static pages, as well as dynamic pages like blog posts and product pages. You can also include images and videos in your sitemap to help search engines understand your content better.
How do I handle large sitemaps?
If your website has a lot of pages, you may need to split your sitemap into multiple files. The ‘sitemap_generator’ gem supports this out of the box. You just need to specify the maximum number of links per sitemap in your configuration file.
How do I handle changes to my website structure?
If you make changes to your website structure, you should regenerate your sitemap to reflect these changes. You can do this manually by running the ‘rake sitemap:refresh’ command, or you can automate the process using the ‘whenever’ gem.
How do I handle multilingual websites?
If your website is available in multiple languages, you should create a separate sitemap for each language. You can do this by creating multiple configuration files for the ‘sitemap_generator’ gem, one for each language.
How do I handle pagination in my sitemap?
If your website uses pagination, you should include all paginated pages in your sitemap. The ‘sitemap_generator’ gem makes this easy by providing a ‘paginate’ method that you can use in your configuration file.
How do I handle errors in my sitemap?
If there are errors in your sitemap, search engines may not be able to index your website properly. You can check for errors by validating your sitemap using a sitemap validation tool. If you find any errors, you should fix them and regenerate your sitemap.
Ilya Bodrov is personal IT teacher, a senior engineer working at Campaigner LLC, author and teaching assistant at Sitepoint and lecturer at Moscow Aviations Institute. His primary programming languages are Ruby (with Rails) and JavaScript. He enjoys coding, teaching people and learning new things. Ilya also has some Cisco and Microsoft certificates and was working as a tutor in an educational center for a couple of years. In his free time he tweets, writes posts for his website, participates in OpenSource projects, goes in for sports and plays music.