HTML5’s Microdata, Search, and the Collaboration of the Search GiantsBy Alexis Goldstein
One of the most overlooked new features of HTML5 is Microdata. Microdata allows us to more specifically categorize and label our web content in a machine-readable way. Why this is important is because it may positively affect your search results.
When you search for something on Google or elsewhere, you get a list of links with a few sentences of description below them. We all use these descriptions—which Google calls “Rich Snippets”—to determine which link to click on.
Wouldn’t you like to be able to influence what is displayed in search results snippets for your site? Wouldn’t it be nice to be able to clarify for the search engine and its bots that crawl your page, “Hey Google, I know I have twenty images on my page, but this image, that one is my bio picture.” Or perhaps, “Hi Bing, I know there are tons of links on my page but this link is the link relevant to my event.”
With Microdata, we can mark-up our existing HTML with a few new attributes in order to label and categorize our content in ways search engines can both understand and make use of in their rich snippets.
Before we add these new attributes, we need to pick a vocabulary of the thing we’re trying to describe. We could write our own vocabularies, or we could use someone else’s. We used to have a site, http://data-vocabulary.org, maintained by Google with several vocabularies. But it was unclear if the other search engines supported it. Lucky for us, in June 2011, Google, Microsoft and Yahoo! have teamed up to agree on a very large set of vocabularies we can use on our sites.
The Trinity of Search, Together
It’s hard to imagine Google, Microsoft and Yahoo! collaborating on anything, much less something search-related, but that’s exactly what they’re doing on the site schema.org. Schema.org provides a set of vocabularies we can make use of with Microdata. By using the vocabularies for things like Person, Book and Organization that are on defined schema.org, we can be sure our Microdata will be understood by Google, Bing and Yahoo! searches
How it works
To use Microdata, we must add at least three new attributes to our existing HTML:
To follow along with this example, please see the demonstration page. View source for the full code.
Itemscope sets the scope of what we are describing with microdata. You can think of it as defining a parent element, inside which will contain other elements with information we are trying to supply to search engines. All elements nested an element with the
itemscope will adhere to the vocabulary you specify in #2, the
If we want to describe a person on their online resume, we could wrap a section element around the resume and give it the
itemscope attribute to begin:
<section itemscope> Audre Lorde was an author, academic, activist and poet, known for her many contributions to feminist literature and thought. Perhaps her most celebrated work is " <a href="http://t.co/8wbANUC"> Sister Outsider </a>," a collection of essays and speeches. She passed away on November 17th, 1992. </section>
itemtype attribute is where we declare what vocabulary we’re using, and what thing we’re trying to describe. The most basic vocabulary on schema.org is for, well, a Thing. The Thing vocabulary includes four properties we can set: a description, an image, a name and a url. All other things (Books, Restaurants, Places) descend from the Thing vocabulary.
To continue adding Microdata to our online resume, we add the itemtype to our Person:
<section itemscope itemtype=”http://schema.org/Person”> Audre Lorde was an author, academic, activist and poet, known for her many contributions to feminist literature and thought. Perhaps her most celebrated work is " <a href="http://t.co/8wbANUC"> Sister Outsider </a>," a collection of essays and speeches. She passed away on November 17th, 1992. </section>
itemprop attribute is how we add label the majority of our content. We simply add the
itemprop attribute to existing elements with content we want to label. How are we labeling the content? That depends on what value we assign to the
itemprop attribute. The value must be one of the properties of our vocabulary.
In some cases, you may want to label pure text content with
itemprops. In these cases, there’s no existing HTML elements to add attributes to, so
divs are often added.
To continue with our example:
<section itemscope itemtype=”http://schema.org/Person”> <span itemprop=”name”>Audre Lorde</span> was an <span itemprop=”jobTitle”>author,</span> <span itemprop=”jobTitle”>academic,</span> <span itemprop=”jobTitle”>activist</span> and <span itemprop=”jobTitle”>poet</span>, known for her many contributions to feminist literature and thought. Perhaps her most celebrated work is " <a href="http://t.co/8wbANUC"> Sister Outsider </a>," a collection of essays and speeches. She passed away on November 17th, 1992. </section>
Now we have specified which item property we care about, but, how does the search engine know what content I mean? Does it just take the text content inside the span elements (“Audre Lorde”, “author”, etc) How would it know where to grab the URL from for the
itemprop=”url” on the
The good news is that it essentially always grabs the value you’d hope it would grab. The complete list is in the Microdata spec:
linkelements take the value in the
metaelement takes the value in the
videoelements take the absolute URL from the
timeelement takes the value from the
objectelement takes the absolute URL from the
- All other elements take the text content inside the element. (example:
We have a date of death in our small biographical page. Can we just add
itemprop=”deathDate”, which is a property defined in the Person vocabulary? Unfortunately, we can’t. We need to first wrap it in the new HTML5
time element, to ensure we have a computer-readable date.
She passed away on <time itemprop="deathDate" datetime="1992-11-17">November 17th, 1992</time>.
This all sounds familiar…
You may find these concepts familiar, as they’ve been around for some time. Microformats is one way that has been used in the past to make HTML content machine-readable. If you take a look at the HTML on a LinkedIn profile, you’ll find that it’s marked up with the hCard microformat. The same is true of Facebook Events.
One issue with microformats is that we’re overloading the
class attribute in a non-standard way. It becomes hard to distinguish “is this class being used in my CSS, or is this for microformats?” By using the new, dedicated attributes
itemprop, Microdata avoids this confusion.
Another limitation with microformats is that any data you want to include must live inside a single parent element—which can be limiting, especially if you have relevant information, say, in the footer of your website.
Including items beyond the parent item’s scope with
Let’s assume way down at the bottom of the page, we had a footer with a list of footnotes about the text about Audre Lorde above. If there footnotes were links, it would be nice to include them as
itemprop=”url” in our mark-up.
<section itemscope itemtype=”http://schema.org/Person”> <span itemprop=”name”>Audre Lorde</span> was an <span itemprop=”jobTitle”>author,</span> <span itemprop=”jobTitle”>academic,</span> <span itemprop=”jobTitle”>activist</span> and <span itemprop=”jobTitle”>poet</span>, known for her many contributions to feminist literature and thought[<a href=”#citation1”>1</a>]. <!-- snip --> </section> <!-- more HTML --> <footer> <a name=”citation1”></a> <a id=”cite1” itemprop=”url” href=”http://en.wikipedia.org/wiki/Audre_Lorde”> Audre Lorde on Wikipedia </a> </footer>
How can we include this link in our footer in the page? By using an attribute called
itemref in the section opening tag:
<section itemscope itemtype=”http://schema.org/Person” itemref=”cite1”>
That will tell the search engines to also grab the relevant content out of an element whose
id is cite1, as our link element in the footer has.
What if you want to refer to more than one
id later on in the page? The way to do this is to simply specify a space-separated list of
ids in the
itemref, much as you can specify multiple
class attributes by using a space-separated list.
Microdata allows us to mark up our existing elements and text with machine-readable labels, allowing our pages to be more clearly seen and understood by the major search engines. Google, Yahoo! And Microsoft have come together to give us a rich vocabulary of things we can describe with http://schema.org. And as a proper part of the HTML5 spec with its own dedicated attributes, a little extra work may give you a real edge in your search results.