I recently signed up for Jeremy Keith’s podcasting web app, Huffduffer. The sign-up form has received a lot of buzz because of its “Mad Lib”-style design, but what interested me was that after I created my account, I saw the following in the lower right-hand corner of my profile page:
Huffduffer took the one web site URL I provided it with (louissimoneau.com) and found just about every profile of mine, from every social network or web application I’ve ever used. It’s not apparent in the figure, but those links point to my profiles on those sites. How is this possible?
Digging around through a number of Jeremy’s blog posts, I found that he’s using the Google Social Graph API to gather this data. My web site points to my Twitter, Last.fm, and FriendFeed profiles, and those in turn (mainly FriendFeed) link out to all my other profiles. Google is able to follow these links and know that they refer to other instances of me through clever use of the XFN microformat (more on this a bit later).
Jeremy’s implementation, though definitely cool, is a little blunt. Some people may object to having every single one of their web identities listed on their profile, and as Google’s spider is relying on cached data, some of the information may be out of date. For example, I deleted interblag.tumblr.com years ago and it has since been picked up by another individual.
I thought it might be a cool idea to try and develop a slightly more sophisticated version of this functionality, and at the same time learn a bit about how the Social Graph API works its magic. I’ll be using Ruby on Rails for this implementation, but as the code is very simple you should be able to adapt it easily to whichever platform you’re comfortable with.
Before we go any further, it’s a good idea to step back and try to figure out how this functionality is possible in the first place. How does Google know that those other sites are also “me”? The answer is microformats. If you’re a real HTML nerd or have a penchant for semantic code, you’ll already know about microformats, but for the rest of us here’s a quick catch-up course.
Microformats are just that: mini formats, which happen to sit inside the larger format of HTML. HTML lacks a way, for instance, of indicating that a link is pointing to a web site of an individual you’ve met in person—that is, there’s no “met” attribute that would let you do:
<a href="http://somesite.com" met="true" />
This code, of course, is invalid. Fortunately, HTML has a few attributes that can be co-opted for this purpose.
class is the most obvious, but for links and anchors we also have
rel. For example:
<a href="http://somesite.com" rel="met" />
This snippet of code, unlike the previous one, is perfectly valid. This in and of itself may be of little value: at best you could use CSS to give a different style to links pointing to people you’ve met. But if web developers across the world agree on a way of representing this information, suddenly the range of possibilities explodes: a web spider can crawl links between people who’ve met each other and construct a map of these relationships.
There are currently microformat standards for everything from addresses (hCard) to friendships (XFN) to tags (rel-tag), as well as a number of others currently in the draft stage.
Google is using the microformat standards XFN (eXtensible Friend Network) and FOAF (Friend of a Friend) to crawl networks of links. These links point to people represented by URIs, and it’s the relationships between these people that Google is attempting to figure out. FOAF is a slightly more complicated format that involves creating a separate file detailing all your friendships using XML; that’s unnecessary for our implementation so we’ll just focus on XFN.
The most basic use of XFN is using
rel=”me” in a link to denote that the site being linked to belongs to the same person as the site we’re on.
There is, of course, a problem here: this microformat data is only present if a person has bothered to put it there. And only a small percentage of web site owners even know what microformats are, let alone use them on their sites. Well, the good news is that several large sites and applications add XFN to links automatically. Last.fm and FriendFeed, for example, both use this standard when you add links to your other sites. So you may have a social graph without even being aware of it.
WordPress blogs also make it easy to mark up links in this way. Scroll down the page when adding a new link to your WordPress blog and you’ll see the following:
Hence, even without widespread knowledge of microformats, it turns out that there are a significant number of marked-up XFN links out in the wild for the Google Social Graph API to spider.