Ruby
Article
By Ilya Bodrov-Krukowski

Start Your SEO Right with Sitemaps on Rails

By Ilya Bodrov-Krukowski
Help us help you! You'll get a... FREE 6-Month Subscription to SitePoint Premium Plus you'll go in the draw to WIN a new Macbook SitePoint 2017 Survey Yes, let's Do this It only takes 5 min

After crafting your website, the next step usually involves taking care of search engine optimization (SEO). With that in mind, creating a sitemap is one of the tasks that you will need to solve. According to the protocol, sitemaps are UTF-8 encoded XML files that describe the structure of your site. They are quite simple, but for large sites creating them by hand is not an option. Therefore, it’s a smart move to automate generating sitemaps.

There are a number of solutions for Rails to generate sitemaps available, but I prefer a gem called sitemap_generator. It is being actively maintained and has a number of cool features:

  • It is framework-agnostic, so you may use it without Rails
  • It is very flexible
  • It has own configuration file and is not strictly bound to your app’s routes
  • It allows you to automatically upload sitemaps to third-party storage
  • It automatically pings search engines when a new sitemap is generated
  • It supports multiple sitemap files and various types of sitemaps (video, news, images, etc.)

In this article we will see SitemapGenerator in action by integrating it into a sample Rails app and discussing its main features. I will also explain how to export sitemaps to cloud storage so that everything works properly on platforms like Heroku.

The source code for this article is available at GitHub.

Creating a Sample Site

As usual, start off by creating a new Rails application:

$ rails new Sitemapper -T

I will be using Rails 5.0.1 but SitemapGenerator works with virtually any version.

We will need some sample models, routes, and controllers. Views can be omitted for this demo – it does not really matter what content the site actually has.

Suppose we are creating a blog that has posts and categories; one category can have many posts. Run the following commands to generate models and migrations:

$ rails g model Category title:string
$ rails g model Post category:belongs_to title:string body:text
$ rails db:migrate

Make sure that models have the proper associations set up:

models/category.rb

[...]
has_many :posts, dependent: :destroy
[...]

models/post.rb

[...]
belongs_to :category
[...]

Now let’s set up some routes. To make things a bit more interesting, I will make them nested:

config/routes.rb

[...]
resources :categories do
  resources :posts
end
[...]

Also, while we are here, add the root route:

config/routes.rb

[...]
root to: 'pages#index'
[...]

Now create the controllers. We don’t really need any actions inside, so they will be very simple:

categories_controller.rb

class CategoriesController < ApplicationController
end

posts_controller.rb

class PostsController < ApplicationController
end

pages_controller.rb

class PagesController < ApplicationController
end

Great! Before proceeding, however, let’s also take care of sample data inside our application.

Loading Sample Data

To see SitemapGenerator in action we will also require some sample data. I am going to use the Faker gem for this task:

Gemfile

[...]
group :development do
    gem 'faker'
end
[...]

Install it:

$ bundle install

Now modify the db/seeds.rb file:

db/seeds.rb

5.times do
  category = Category.create({
                               title: Faker::Book.title
                             })

  5.times do
    category.posts.create({
                            title: Faker::Book.title,
                            body: Faker::Lorem.sentence
                          })
  end
end

We are creating five categories each with five posts that have some random content. To run this script use the following command:

$ rails db:seed

Nice! Preparations are done, so let’s proceed to the main part.

Integrating Sitemap Generator

Add the gem we’re using into the Gemfile:

Gemfile

[...]
gem 'sitemap_generator'
[...]

Install it:

$ bundle install

To create a sample config file with useful comments, employ this command:

$ rake sitemap:install

Inside the config directory you will find a sitemap.rb file. The first thing to do here is specify the hostname of your site:

config/sitemap.rb

SitemapGenerator::Sitemap.default_host = "http://www.example.com"

Note that this gem also supports multiple host names.

The main instructions for SitemapGenerator should be placed inside the block passed to the SitemapGenerator::Sitemap.create method. For example, let’s add a link to our root path:

config/sitemap.rb

SitemapGenerator::Sitemap.create do
  add root_path
end

The add method accepts a bunch of arguments. Specify that the root page is being updated daily:

config/sitemap.rb

add root_path, :changefreq => 'daily'

What about the posts and categories? They are being added by the users dynamically so we must query the database and generate links on the fly:

config/sitemap.rb

[...]
Category.find_each do |category|
  add category_posts_path(category), :changefreq => 'weekly', :lastmod => category.updated_at

  category.posts.each do |post|
    add category_post_path(category), :changefreq => 'yearly', :lastmod => post.updated_at
  end
end
[...]

Note that here I’ve also provided the :lastmod option to specify when the page was last updated (the default value is Time.now).

Running Generator and Inspecting Sitemap Files

To generate a new sitemap (or update an existing one) run the following command:

$ rails sitemap:refresh

Note that if, for some reason, a sitemap fails to be generated, the old version won’t be removed. Another important thing to remember is that the script will automatically ping Google and Bing search engines to notify that a new version of a sitemap is available. Here is the sample output from the command above:

+ sitemap.xml.gz          1 sitemap /  251 Bytes
Sitemap stats: 62 links / 1 sitemap / 0m01s

Pinging with URL 'http://www.example.com/sitemap.xml.gz':
    Successful ping of Google
    Successful ping of Bing

If you need to ping additional engines, you may modify the SitemapGenerator::Sitemap.search_engines hash. Also you may omit pinging of search engines by saying

$ rails sitemap:refresh:no_ping

Generated sitemaps will be placed inside the public directory with the .xml.gz extension. You may extract this file and browse it with any text editor. If for some reason you don’t want files to be compressed with GZip, set the SitemapGenerator::Sitemap.compress option to false.

Now that you have a sitemap in place, the public/robots.txt file should be modified to provide a link to it:

public/robots.txt

Sitemap: http://www.example.com/sitemap.xml.gz

SitemapGenerator may create an index file depending on how many links your sitemap has. By default (the :auto option) if there are more than 50 000 links, they will be separated into different files and links to them will be added into the index. You can control this behavior by changing the SitemapGenerator::Sitemap.create_index option. Other available options are true (always generate index) and false (never generate index).

If you wish to add a link directly into the index file, use the add_to_index method that is very similar to the add method.

--ADVERTISEMENT--

Multiple Locales

Now suppose our blog supports two languages: English and Russian. Set English as the default locale and also tweak the available_locales setting:

config/application.rb

[...]
config.i18n.default_locale = :en
config.i18n.available_locales = [:en, :ru]
[...]

Now scope the routes:

config/routes.rb

[...]
scope "(:locale)", locale: /#{I18n.available_locales.join("|")}/ do
    resources :categories do
      resources :posts
    end

    root to: 'pages#index'
end
[...]

It is probably a good idea to separate sitemaps for English and Russian locales into different files. This is totally possible, as SitemapGenerator supports groups:

config/sitemap.rb

[...]
{en: :english, ru: :russian}.each_pair do |locale, name|
  group(:sitemaps_path => "sitemaps/#{locale}/", :filename => name) do
    add root_path(locale: locale), :changefreq => 'daily'

    Category.find_each do |category|
      add category_posts_path(category, locale: locale), :changefreq => 'weekly', :lastmod => category.updated_at

      category.posts.each do |post|
          add category_post_path(category, post, locale: locale), :changefreq => 'yearly', :lastmod => post.updated_at
      end
    end
  end
end
[...]

The idea is very simple. We are creating a public/sitemaps directory that contains ru and en folders. Inside there are english.xml.gz and russian.xml.gz files. I will also instruct the script to always generate the index file:

config/sitemap.rb

[...]
SitemapGenerator::Sitemap.create_index = true
[...]

Deploying to Heroku

Our site is ready for deployment, however, there is a problem: Heroku does not allow us to persist custom files. Therefore we must export the generated sitemap to cloud storage. I will use Amazon S3 for this demo, so add a new gem into the Gemfile:

Gemfile

[...]
gem 'fog-aws'
[...]

Install it:

$ bundle install

Now we need to provide a special configuration for SitemapGenerator explaining where to export the files:

config/sitemap.rb

[...]
SitemapGenerator::Sitemap.adapter = SitemapGenerator::S3Adapter.new(fog_provider: 'AWS',
                                                                    aws_access_key_id: 'KEY',
                                                                    aws_secret_access_key: 'SECRET',
                                                                    fog_directory: 'DIR',
                                                                    fog_region: 'REGION')

SitemapGenerator::Sitemap.public_path = 'tmp/'
SitemapGenerator::Sitemap.sitemaps_host = "https://example.s3.amazonaws.com/"
SitemapGenerator::Sitemap.sitemaps_path = 'sitemaps/'
[...]

SitemapGenerator::S3Adapter.new contains configuration for S3. To obtain a key pair, you need to log into aws.amazon.com and create an account with read/write permission to access the S3 service. Do not publicly expose this key pair! Also create an S3 bucket in a chosen region (default is us-east-1).

Next, we are setting tmp/ for the public_path option – that’s the directory where the file will be initially created before being exported to S3.

sitemaps_host should contain a path to your S3 bucket.

sitemaps_path is a relative path inside your bucket.

Some more information about this configuration can be found on this page.

Another problem is that some platforms (Bing, for example) require sitemaps to be located under the same domain, therefore we need to take care of it as well. Let’s add a route /sitemap to our application that will simply perform redirect to S3:

config/routes.rb

[...]
get '/sitemap', to: 'pages#sitemap'
[...]

The corresponding action:

pages_controller.rb

[...]
def sitemap
  redirect_to 'https://example.s3.amazonaws.com/sitemaps/sitemap.xml.gz'
end
[...]

As you remember, by default SitemapGenerator will ping search engines but it will provide a direct link to S3 which is not what we want. Utilize the ping_search_engines method to override this behavior:

config/sitemap.rb

[...]
SitemapGenerator::Sitemap.ping_search_engines('http://example.com/sitemap')
[...]

Do note that now you need to generate sitemap by running

$ rake sitemap:refresh:no_ping

because otherwise SitemapGenerator will ping search engines with both the direct link and http://example.com/sitemap.

Lastly, update the robots.txt with a new link:

public/robots.txt

Sitemap: http://www.example.com/sitemap

This is it, now your site is ready to be published to Heroku!

Conclusion

We’ve reached the end of this article! By now you should be familiar with SitemapGenerator’s key features and be able to integrate it into your own application. If you have any questions, don’t hesitate to post them into the comments. Also, browse the gem’s documentation, as it has a number of other features that we haven’t discussed.

Thanks for staying with me and see you soon!

Login or Create Account to Comment
Login Create Account
Recommended
Sponsors
Get the most important and interesting stories in tech. Straight to your inbox, daily.Is it good?