Programming
Article

HTML5’s Microdata, Search, and the Collaboration of the Search Giants

By Alexis Goldstein

One of the most overlooked new features of HTML5 is Microdata. Microdata allows us to more specifically categorize and label our web content in a machine-readable way. Why this is important is because it may positively affect your search results.

When you search for something on Google or elsewhere, you get a list of links with a few sentences of description below them. We all use these descriptions—which Google calls “Rich Snippets”—to determine which link to click on.

Wouldn’t you like to be able to influence what is displayed in search results snippets for your site? Wouldn’t it be nice to be able to clarify for the search engine and its bots that crawl your page, “Hey Google, I know I have twenty images on my page, but this image, that one is my bio picture.” Or perhaps, “Hi Bing, I know there are tons of links on my page but this link is the link relevant to my event.”

With Microdata, we can mark-up our existing HTML with a few new attributes in order to label and categorize our content in ways search engines can both understand and make use of in their rich snippets.

Before we add these new attributes, we need to pick a vocabulary of the thing we’re trying to describe. We could write our own vocabularies, or we could use someone else’s. We used to have a site, http://data-vocabulary.org, maintained by Google with several vocabularies. But it was unclear if the other search engines supported it. Lucky for us, in June 2011, Google, Microsoft and Yahoo! have teamed up to agree on a very large set of vocabularies we can use on our sites.

The Trinity of Search, Together

It’s hard to imagine Google, Microsoft and Yahoo! collaborating on anything, much less something search-related, but that’s exactly what they’re doing on the site schema.org. Schema.org provides a set of vocabularies we can make use of with Microdata. By using the vocabularies for things like Person, Book and Organization that are on defined schema.org, we can be sure our Microdata will be understood by Google, Bing and Yahoo! searches

How it works

To use Microdata, we must add at least three new attributes to our existing HTML:

  1. Itemscope
  2. Itemtype
  3. Itemprop

To follow along with this example, please see the demonstration page. View source for the full code.

Itemscope

Itemscope sets the scope of what we are describing with microdata. You can think of it as defining a parent element, inside which will contain other elements with information we are trying to supply to search engines. All elements nested an element with the itemscope will adhere to the vocabulary you specify in #2, the itemtype.

If we want to describe a person on their online resume, we could wrap a section element around the resume and give it the itemscope attribute to begin:

<section itemscope> 
Audre Lorde was an author, academic, activist and poet, known for 
her many contributions to feminist literature and thought. Perhaps 
her most celebrated work is " 
      <a href="http://t.co/8wbANUC"> 
             Sister Outsider
      </a>," a collection of essays and speeches. She passed away on 
November 17th, 1992. 
</section>

Itemtype

The itemtype attribute is where we declare what vocabulary we’re using, and what thing we’re trying to describe. The most basic vocabulary on schema.org is for, well, a Thing. The Thing vocabulary includes four properties we can set: a description, an image, a name and a url. All other things (Books, Restaurants, Places) descend from the Thing vocabulary.

To continue adding Microdata to our online resume, we add the itemtype to our Person:

<section itemscope itemtype=”http://schema.org/Person”> 
Audre Lorde was an author, academic, activist and poet, known for 
her many contributions to feminist literature and thought. Perhaps 
her most celebrated work is " 
      <a href="http://t.co/8wbANUC"> 
             Sister Outsider 
      </a>," a collection of essays and speeches. She passed away on 
November 17th, 1992. 
</section>

Itemprop

The itemprop attribute is how we add label the majority of our content. We simply add the itemprop attribute to existing elements with content we want to label. How are we labeling the content? That depends on what value we assign to the itemprop attribute. The value must be one of the properties of our vocabulary.

In some cases, you may want to label pure text content with itemprops. In these cases, there’s no existing HTML elements to add attributes to, so spans or divs are often added.

To continue with our example:

<section itemscope itemtype=”http://schema.org/Person”>
    <span itemprop=”name”>Audre Lorde</span> 
was an 
   <span itemprop=”jobTitle”>author,</span> 
   <span itemprop=”jobTitle”>academic,</span> 
   <span itemprop=”jobTitle”>activist</span> and 
   <span itemprop=”jobTitle”>poet</span>, known for her 
many contributions to feminist literature and thought. 

Perhaps her most celebrated work is " 
      <a href="http://t.co/8wbANUC"> 
             Sister Outsider 
      </a>," a collection of essays and speeches. She passed away on 
November 17th, 1992. 
</section>

Now we have specified which item property we care about, but, how does the search engine know what content I mean? Does it just take the text content inside the span elements (“Audre Lorde”, “author”, etc) How would it know where to grab the URL from for the itemprop=”url” on the a element?

The good news is that it essentially always grabs the value you’d hope it would grab. The complete list is in the Microdata spec:

  • A, area or link elements take the value in the href attribute
  • A meta element takes the value in the content attribute
  • Audio, embed, iframe, img, source, track or video elements take the absolute URL from the src attribute
  • The time element takes the value from the datetime attribute
  • The object element takes the absolute URL from the data attribute
  • All other elements take the text content inside the element. (example: span, p, div)

Speaking of Datetime

We have a date of death in our small biographical page. Can we just add itemprop=”deathDate”, which is a property defined in the Person vocabulary? Unfortunately, we can’t. We need to first wrap it in the new HTML5 time element, to ensure we have a computer-readable date.

She passed away on <time itemprop="deathDate" datetime="1992-11-17">November 
17th, 1992</time>.

This all sounds familiar…

You may find these concepts familiar, as they’ve been around for some time. Microformats is one way that has been used in the past to make HTML content machine-readable. If you take a look at the HTML on a LinkedIn profile, you’ll find that it’s marked up with the hCard microformat. The same is true of Facebook Events.

One issue with microformats is that we’re overloading the class attribute in a non-standard way. It becomes hard to distinguish “is this class being used in my CSS, or is this for microformats?” By using the new, dedicated attributes itemscope, itemtype and itemprop, Microdata avoids this confusion.

Another limitation with microformats is that any data you want to include must live inside a single parent element—which can be limiting, especially if you have relevant information, say, in the footer of your website.

Including items beyond the parent item’s scope with Itemref

Let’s assume way down at the bottom of the page, we had a footer with a list of footnotes about the text about Audre Lorde above. If there footnotes were links, it would be nice to include them as itemprop=”url” in our mark-up.

<section itemscope itemtype=”http://schema.org/Person”>
    <span itemprop=”name”>Audre Lorde</span> 
was an 
    <span itemprop=”jobTitle”>author,</span> 
    <span itemprop=”jobTitle”>academic,</span> 
    <span itemprop=”jobTitle”>activist</span> and 
    <span itemprop=”jobTitle”>poet</span>, known for her 
many contributions to feminist literature and 
thought[<a href=”#citation1”>1</a>]. 
    <!-- snip --> 
</section> 

    <!-- more HTML --> 
<footer> 
    <a name=”citation1”>[1]</a> 
    <a id=”cite1” itemprop=”url” 
href=”http://en.wikipedia.org/wiki/Audre_Lorde”> 
Audre Lorde on Wikipedia 
    </a> 
</footer>

How can we include this link in our footer in the page? By using an attribute called itemref in the section opening tag:

<section itemscope itemtype=”http://schema.org/Person” itemref=”cite1”>

That will tell the search engines to also grab the relevant content out of an element whose id is cite1, as our link element in the footer has.

What if you want to refer to more than one id later on in the page? The way to do this is to simply specify a space-separated list of ids in the itemref, much as you can specify multiple class attributes by using a space-separated list.

Summary

Microdata allows us to mark up our existing elements and text with machine-readable labels, allowing our pages to be more clearly seen and understood by the major search engines. Google, Yahoo! And Microsoft have come together to give us a rich vocabulary of things we can describe with http://schema.org. And as a proper part of the HTML5 spec with its own dedicated attributes, a little extra work may give you a real edge in your search results.

Further Resources

Getting started guide
Google Rich Snippets
Rich Snippets Testing tool

  • http://twitter.com/patricksamphire Patrick Samphire

    It’s a reasonably attractive concept, but the code bloat in just this simple example is extreme, and I’m not sure it’s really worth it.

    • http://profiles.google.com/axel.miller98 Alex Miller

      I disagree, it will make your code less readable, but it seems that it will help in SEO. I love quick links that are given to me when I search a certain topic, helps me get to what I need quickly. I think this is an extremely useful HTML 5 tool.

  • http://www.itmitica.com/en IT Mitică

    It seems to me like a sort of a double tagging.

    Microdata or an archiving effort? The ones found in old libraries, where no automated system was in place, just a words card system.

  • http://livetotry.com alexisgo

    The screenshot has been updated.

  • http://twitter.com/kliehm Martin Kliehm

    Just a shame this is invalid attribute soup. Why don’t you use the HTML5 “data-” prefix?

  • http://pulse.yahoo.com/_L5AVXSH7OFHI5YSAJYCO2GGCDU Savannah Hogan

    I just paid $22.87 for an iPad2-64GB and my girlfriend loves her Panasonic Lumix GF 1 Camera that we got for $38.76 there arriving tomorrow by UPS. I will never pay such expensive retail prices in stores again. Especially when I also sold a 40 inch LED TV to my boss for $675 which only cost me $62.81 to buy. Here is the website we use to get it all from, CoolCent. com

  • http://pulse.yahoo.com/_L5AVXSH7OFHI5YSAJYCO2GGCDU Savannah Hogan

    I just paid $22.87 for an iPad2-64GB and my girlfriend loves her Panasonic Lumix GF 1 Camera that we got for $38.76 there arriving tomorrow by UPS. I will never pay such expensive retail prices in stores again. Especially when I also sold a 40 inch LED TV to my boss for $675 which only cost me $62.81 to buy. Here is the website we use to get it all from, CoolCent. com

  • http://pulse.yahoo.com/_L5AVXSH7OFHI5YSAJYCO2GGCDU Savannah Hogan

    I just paid $22.87 for an iPad2-64GB and my girlfriend loves her Panasonic Lumix GF 1 Camera that we got for $38.76 there arriving tomorrow by UPS. I will never pay such expensive retail prices in stores again. Especially when I also sold a 40 inch LED TV to my boss for $675 which only cost me $62.81 to buy. Here is the website we use to get it all from, CoolCent. com

  • Paul

    This is much less verbose than RDFa and microformats. Is there any data that shows performs best? Also, is this method as robust as RDFa/microformats?

Recommended

Learn Coding Online
Learn Web Development

Start learning web development and design for free with SitePoint Premium!

Get the latest in Front-end, once a week, for free.