Blog Post RSS ?

Blogs » Web Tech » Why RDFa is the only Web scaleable metadata format for next-generation search engines
 

Why RDFa is the only Web scaleable metadata format for next-generation search engines


  • Save to
    Del.icio.us

by David Peterson

Yahoo! is soon to launch their next generation Web search system dubbed SearchMonkey. This means that content developers have a powerful new tool in their arsenal. Something that was nearly impossible before. Here is a quick preview from Yahoo!

search 1.0

next-generation search

No longer dependant on Google

You no longer have to depend on Google’s good graces (and their smart people) to make sense of the content you have worked hard to create. You can explicitly specify what you meant with no ambiguity.

I was going to write this post in a few days, but a post on OpenBible.info has forced my hand — thanks for that :)

Yahoo! last week announced that it’s going to start indexing semantic data, including support for certain microformats.

Bibleref isn’t one of those microformats. Should Bibleref proponents lobby Yahoo! to index Bibleref, or should Bibleref change its syntax to be compatible with RDFa or another semantic web standard?

….

So what should Bibleref’s proponents do? It’s possible we could convince Yahoo! to index Bibleref, giving it the traction it needs to take off. However, I wouldn’t necessarily expect Yahoo! to do a good job understanding the data, in part because of the looseness of the standard (which I see as a good thing). And if Yahoo! doesn’t understand it well, then search results based on Bibleref won’t be very high quality. But a lot depends on how Yahoo! exposes the data. (And they may not even want to index Bibleref.)

RDFa

Another possibility is to change Bibleref to be compatible with RDFa, an emerging standard that Yahoo! does understand….

They did a better job of explaining it than I would have! The good folks working on Bibleref are now in the situation where I believe many, many of you will be in soon.

The $64,000 question

//en.wikipedia.org/wiki/The_$64,000_Question

How do we publish our intelligent information in a format that will be understood by Yahoo! SearchMonkey and other next-gen search engines? How do you get your valuable metadata out there in the new frontier of the Linked-Data Web/Semantic Web?

Currently there are two main options: RDFa and microformats. But of course it is not as simple as that.

The problem with microformats

The main problem with microformats is that each time a new one is created the search indexer needs to develop a custom extractor to make sense of the microformat. That is why Yahoo microsearch is only indexing 3 of the most popular formats and why when SearchMonkey launches, it will only index 5. 5 out of 20 listed on the main wiki page and 74 on the Exploratory page.

This means that if you use any of the 94 listed microformats, SearchMonkey will only see 5 of them.

There are also other problems that have been previously noted by others. It is difficult to mix and match different microformats; that imposes a big limitation on layout flexibility. No easy way to validate your work. The use of microformats also raise accessibility concerns.

Therein lies the issue with microformats. Without an underlying abstract data model, validation becomes a bit like standing back looking at a used car, kicking the tyres, concluding "yeah, looks alright", and then handing over the cash - source.

What is a search engine to do?

So as a search engine company what are you going to prefer? Write ONE RDFa parser and take in ALL metadata that is created with RDFa. Or write a new parser for EVERY microformat that is now available plus every new one in the future?

Web scaleable metadata

RDFa is soon to be a W3C standard (or Recommendation as they call them). It has taken a while for all the pieces to come together but anything this important does take time. And with that time comes a very well thought out solution:

  1. Scaleable - any vocabularies you want. Create your own and go wild!
  2. Mixable - mix and match any vocabulary you want in any layout you want.
  3. W3C Standard - the reason that one parser will read any vocabulary, validation is trivial.
  4. Globally Identifiable - give any thing on your page a URL and it becomes a "living" data point on the Web; easily addressable by anyone.
  5. Your page becomes a stand-alone linked data client; queryable like a database. This is really cool. 

Find out more about RDFa

The RDFa group has just launched a wiki (with a growing body of info) and a mailing list. They can also be contacted on IRC/#swig. Keep checking back as I plan on adding new posts on how to use RDFa in your own web pages.

  (image from wikipedia)

This post has 15 responses so far

  1. I can think of one other problem.. no one’s worried about it right now, but what happens when metaspam comes?

     
  2. Philip, I wonder about that too. Search engines (in the 1800s I believe) used to use META tags to determine what a page was about. How are these new search techniques going to rank pages that have these semantic relationships?

     
  3. […] Peterson gives a good analysis of what Yahoo’s announcement means for sites wanting to get semantic information into search […]

     
  4. > The main problem with microformats is that each time a new one is created the search indexer needs to develop a custom extractor to make sense of the microformat.

    Boo hoo! When it’s as easy as it is with GRDDL, I’m really, really not fussed.

    Imagine: I make a new microformat, and I take the time to make a profileTransformation - this problem is lessened dramatically.

     
  5. […] Blogs has an article which asks and answers the question » Why RDFa is the only Web scaleable metadata format for next-generation search engines. A quote from the article: How do we publish our intelligent information in a format that will be […]

     
  6. Good topic to be discussing these days. Please do continue writing about how the average designer/developer at SitePoint can start implementing RDFa in their work.

     
  7. I whole-heartedly agree with Dan Grossman. I’d love to hear more about it as well! Thanks for all the information you’ve leaked in thus far!

     
  8. […] Peterson, on his blog at Sitepoint, has had quite a bit to say about the issue and one of the points he makes is that, if you haven’t already, there […]

     
  9. Daniel O’Connor - Boo hoo! When it’s as easy as it is with GRDDL, I’m really, really not fussed.

    Imagine: I make a new microformat, and I take the time to make a profileTransformation - this problem is lessened dramatically.

    It is not *just me* saying this. Yahoo will only support 5 microformats, Firefox 3 [1] will also only support 5 microformats.

    The fundamental problem with microformats is that there is no standard model so no standard way to extract them from pages. There is no way around it. And when big players with a lot of resources put their support only behind a small subset it really makes it clear that microformats don’t work on the scale of the global linked Web.

    [1] http://developer.mozilla.org/en/docs/Using_microformats

     
  10. Philip - I can think of one other problem.. no one’s worried about it right now, but what happens when metaspam comes?

    There are a lot of people thinking about this; here is someone inside Yahoo itself [1].

    [1] http://dubinko.info/blog/2008/03/13/the-lowercase-semantic-web-goes-mainstream/

     
  11. Excellent points on RDFa vs. microformats. I find the arguments on scalability and mixability particularly compelling, as it clearly provides much more flexibility to site owners to tailor their semantic web presence.

    Thanks for the comment on my blog — folks who read my post on marketing implications [1] of this move by Yahoo! will find it very interesting.

    [1] http://www.chiefmartec.com/2008/03/seo-semantic-we.html

     
  12. […] really have time at the moment to get into a full-on Microformats chit-chat, I did however find this article on SitePoint authored by David Peterson that goes into some depth as to why it might be better to […]

     
  13. […] Why RDFa is the only Web scaleable metadata format for next-generation search engines […]

     
  14. […] Why RDFa is the only Web scaleable metadata format for next-generation search engines — David Peterson, sitepoint Blog. […]

     
  15. […] rich search you also get enhanced search results. I have blogged about this previously so take a look. It is really cool stuff and I will be discussing it in much more detail over the course of the […]

     

Sponsored Links

Leave a response

You are not logged in, log in with your SitePoint Forum username and password.

-OR- Post Anonymously

* Make sure any code samples are escaped (i.e. ‘<b>’ becomes ‘&lt;b&gt;’).

If not logged in, your comments will be placed in a moderation queue. This means your comment may not appear until one of our moderators approves it.

SitePoint Marketplace

Buy and sell Websites, templates, domain names, hosting, graphics and more.

Logo Design, Web page Design and more!

99designs

  • Custom logo designs created ‘just for you’.
  • Pick the design you like best.
  • Only pay if you’re satisfied with the result.

Want More Traffic?

Get up to five quotes from qualified SEO specialists, with no obligation!

Get A Free SEO Quote Now!