XML Sitemaps: The Beginner’s Guide

XML Sitemaps Beginner's Guide

This article is part of an SEO series from WooRank. Thank you for supporting the partners who make SitePoint possible.

XML sitemaps are simple text files that are used to provide details for every URL on a website. They include location, date of last modification, change frequency and page priority. If you’ve got an international or multilingual site, you can also use your sitemap to note the relationship between language versions of a URL. All of these elements provide search engine bots with information about your site, helping them crawl and index pages easily and more efficiently.

Here is what the sitemap for a multilingual site should look like:

XML sitemap sample

Here’s a quick rundown of what those terms mean:

  • <urlset> – This is the current protocol standard for opening and closing sitemaps.
  • <url> – The tag that denotes the start of each URL entry.
  • <loc> – The location of the page. Always use absolute URLs uniformly in your sitemaps (https://, www, etc.)
  • <lastmod> – The last date you modified the page. Always use the YYYY-MM-DD format here.
  • <changefreq> – How often you make changes to this file.
  • <priority> – How important this page is to the overall site. The value ranges from 0.0 to 1.0 with 0.5 as the default priority.
  • <xml:link> – This tag provides URLs to alternate versions of the page. In this case they point to versions of the page in different languages.

The <loc> tag is required for each URL you include in your sitemap, which makes sense. It tells search engine bots where the pages are. It’s also worth noting that you might think you can trick search engines into crawling your site more frequently by making your <changefreq> tag daily, but don’t lie. If crawlers sense that the sitemaps don’t reflect reality they’ll ignore them, which could result in your site being crawled even less frequently.

Add <image> extensions to give Google and Bing information about your images to help them appear in image searches. Image tags for sitemaps should be added in the <url> tag, after the <tag> tag:

<url>

    <loc>https://www.example.com</loc>
    <image:image>

        <image:loc>https://www.example.com/image.jpg</image:loc>
    </image:image>

</url>

You can add other optional attributes to your <image> tags to give bots a little more detail:

  • <image:caption> – A short caption or description for the image.
  • <image:geo_location> – The geographic location of the image.
  • <image:title> – Your image’s title.
  • <image:license> – Contains the URL pointing to the license of your image.

If you’ve got video content on your pages, make it findable in video search by adding the <video> extension to your sitemap. If your page www.example.com/video1 has an embedded video, video players or raw video, add <video> elements to the <url>:


<url>

<loc>https://www.example.com/video1</loc>

<video:video>

<video:thumbnail_loc>https://www.example.com/thumbnail/vid1.jpg</video:thumbnail_loc>

<video:title>Sample Video 1</video:title>

<video:description>This is a short description of your video. Maximum 2048 characters.</video:description>

<video:content_loc>https://www.example.com/video/sample1.mov</video:content_loc>

<video:duration>10</video:duration>

</video:video>

</url> 

The tags in the example above represent all of the required tags for video elements. However, there’s a lot more you can tell search engines about your page’s video resources in sitemaps:

  • <video:player_loc> – The URL pointing to the player for the video. If your video is embedded on your page, like from YouTube or Vimeo, you can use this tag instead of <video:content_loc>. You can normally find this URL in the video’s embed code.
  • <video:duration> – The video’s length in minutes, between 0 and 28800 (8 hours). This isn’t technically required, but Google recommends it.
  • <video:expiration_date> – Only include this information if your video will not be available after a certain date. If you do use it, put dates in YYYY-MM-DD format, and times in Thh:mm:ss:TZD format.
  • <video:rating> – The video’s rating. Only values between 0.0 and 5.0 are valid.
  • <video:view_count> – The number of times the video has been watched.
  • <video:publication_date> – The date the video was first published, not the date you put it on your site.
  • <video:family_friendly> – If No, your video will only appear in search results when the user disables SafeSearch. Otherwise, make this Yes.
  • <video:tag> – A very short description of key concepts related to your video. Create a separate <video:tag> element for each tag you use, up to 32 tags.
  • <video:category – The broad subject your video covers, such as SEO, Digital Marketing or Advertising.
  • <video:restriction relationship=allow/deny> – A list of countries where the video cannot play, or a list of the only countries in which users can access the video, dependent on whether you set relationship as allow or deny. The list is space-delimited and uses the ISO 3166 country codes. If you don’t use this tag, it will be assumed that your video is available globally.
  • <video:gallery_loc> – The URL where you can find the collection in which your video appears, if there is one. Each video can have only one gallery_loc tag. If your gallery has a title you can add the title attribute.
  • <video:price currency=" ”> – The price to download the video. The currency= attribute is required and uses the ISO 4217 currency code. Add the optional type= attribute to specify if the download is to own or rent, and resolution= to specify if the video is in HD or SD. You can use this multiple times for each currency you accept.
  • <video:requires_subscription> – Allowed values are yes and no to indicate whether or not a subscription is required to watch the video.
  • <video:uploader> – If your video is embedded from another video site, put the name of the host here. This URL must be the same domain as the <loc> tag.
  • <video:platform_relationship=allow/deny> – The platforms, web, mobile, and tv, where the video can or cannot be accessed. The relationship= attribute defines whether the list is inclusive or exclusive. You can have only one platform tag per video.
  • <video:live> – Whether or not the video is a live stream. Only yes or no are valid.

XML sitemaps are limited by size, both in number of URLs you can include and in file size. Sitemaps can only have 50,000 entries, with up to 1,000 images and a max size of 10MB. If you’ve got a really big site that has lots of pages, images and/or videos, you’ll need to create multiple sitemaps. If you encounter this, you’ll need to create a sitemap of sitemaps, known as a Sitemap Index File.

Sample sitemap index file

As you can see, this is basically a version of a normal sitemap without the extra details: it’s got the <loc> and <lastmod> tags for each URL, which are in this case all your different sitemaps. The main difference here is the <sitemap>, which replaces <url>, and <sitemapindex>, which replaces <urlset>.

Do I Need One?

Is a sitemap strictly necessary? No, not technically. Your website will still work without one, and it can even be crawled and indexed by search engines. Plus, sitemaps aren’t used as a ranking signal, so submitting one won’t make you rank higher.

So why do it? The biggest reason you should create and submit your XML sitemap is indexing. Even though search engines can still technically find your pages without one, adding a sitemap makes it so much easier for them. You might have orphaned pages (pages that got left out of your internal linking), or that are harder to find. Your sitemap is especially important when you’ve recently added pages or created a whole new site that doesn’t have a lot of, or any, links to it yet.

Sitemaps also help search engines crawl your pages more intelligently. They take <changefreq> and <lastmod> tags into account and can adjust their crawl frequency accordingly. You get to be a little proactive about getting search spiders to visit your pages. Upping the priority level of a page makes it more likely that pages will be crawled and indexed more frequently and before other, less important parts of your site.

If you’ve got a geo-targeted international site, or a site that has the same page translated into multiple languages, you can use your XML sitemap to your advantage. As we showed in our example above, putting hreflang tags in your sitemap tells crawlers that you’ve got multiple versions of your page. Search engines can use this information to make sure they’re serving the right version to users based on language and/or location.

Read SEO Tips from the Pros
Make sure the right people find your site at the right time

How Do I Make One?

There are different ways to generate your sitemap. Begin by deciding which pages you want search engines to be able to crawl and index, and make sure you aren’t blocking them with either your robots.txt or robots meta tag. Then, determine your canonical URLs (protocol, www resolve, capital letters, etc.). This is very important because XML sitemaps require you to use absolute URLs, meaning none of the URLs in your sitemap should lead to redirects.

After you’ve determined which pages you want to include and your canonical URLs, the first option is to do it manually. Of course, this isn’t really recommended, especially if your site has more than just a few pages. You’re more likely to make a mistake and less likely to catch it. Fortunately there are lots of tools out there that will generate your sitemap for you. Many of them are free while others, like Screaming Frog, include it as part of a paid service or product.

Once your sitemap is created, make sure it’s not too big. As we mentioned before, you’re limited to 50,000 URLs and 10MB. Make sure your sitemap uses not only your canonical URLs, but also leaves out any extra URL parameters like session IDs and are properly escaped (if you’re using a tool to create your file this has probably already been done for you). Special characters should use the following ASCII characters:

  • Ampersand (&): &amp;
  • Single quote (‘): &apos;
  • Double quote (“): &quot;
  • Greater than (>): &gt;
  • Less than (<): &lt;

You have one last step to go through before uploading your sitemap: compression. Compressing your sitemap will lessen the load on your server, which can make a big difference if it’s a large file or it’s being called multiple times by a bot. It’s best to use gzip to compress your sitemap; search engines have problems opening .zip files.

When you add the sitemap to your site, store it in the root directory — something like https://www.example.com/sitemap.xml. Now that it’s added to your site, crawl it with Google Search Console to find any errors. In Google Search Console, find Sitemaps in the Crawl area, and click on Add/Test Sitemap. Add in your sitemap location URL and click Test.

Test sitemap in Google Search Console

Once you’ve fixed any errors in your sitemap, submit it to Google using that same tool. Submit it to Bing in your Webmaster Tools account by entering its URL, much like you did with Google. Also make sure that you’ve added your XML sitemap location to your robots.txt file using the path Sitemap: https://www.example.com/sitemap.xml.

Monitor your sitemaps in your Google Search Console account to find the dates they were most recently processed, problems encountered with them by Google and the number of pages indexed vs. submitted.

Conclusion

When done right, XML sitemaps help search engines quickly find, crawl and index websites. Make sure you’ve properly formatted, compressed and submitted your XML sitemap to search engines to get the most of their advantages:

  • You no longer need to rely on linking to get your pages crawled.
  • Search engines will see new or updated sites and pages more quickly.
  • Bots can crawl pages more intelligently thanks to the meta information available in sitemaps.
  • You can make sure that search engines are finding important information about images and videos, which are inaccessible to crawlers.

Have you created and submitted an XML sitemap for your website? What benefits have you noticed? Did you encounter any challenges?