Why RDFa is the only Web scaleable metadata format for next-generation search enginesBy David Peterson
Yahoo! is soon to launch their next generation Web search system dubbed SearchMonkey. This means that content developers have a powerful new tool in their arsenal. Something that was nearly impossible before. Here is a quick preview from Yahoo!
No longer dependant on Google
You no longer have to depend on Google’s good graces (and their smart people) to make sense of the content you have worked hard to create. You can explicitly specify what you meant with no ambiguity.
Yahoo! last week announced that it’s going to start indexing semantic data, including support for certain microformats.
Bibleref isn’t one of those microformats. Should Bibleref proponents lobby Yahoo! to index Bibleref, or should Bibleref change its syntax to be compatible with RDFa or another semantic web standard?
So what should Bibleref’s proponents do? It’s possible we could convince Yahoo! to index Bibleref, giving it the traction it needs to take off. However, I wouldn’t necessarily expect Yahoo! to do a good job understanding the data, in part because of the looseness of the standard (which I see as a good thing). And if Yahoo! doesn’t understand it well, then search results based on Bibleref won’t be very high quality. But a lot depends on how Yahoo! exposes the data. (And they may not even want to index Bibleref.)
Another possibility is to change Bibleref to be compatible with RDFa, an emerging standard that Yahoo! does understand….
They did a better job of explaining it than I would have! The good folks working on Bibleref are now in the situation where I believe many, many of you will be in soon.
The $64,000 question
How do we publish our intelligent information in a format that will be understood by Yahoo! SearchMonkey and other next-gen search engines? How do you get your valuable metadata out there in the new frontier of the Linked-Data Web/Semantic Web?
The problem with microformats
The main problem with microformats is that each time a new one is created the search indexer needs to develop a custom extractor to make sense of the microformat. That is why Yahoo microsearch is only indexing 3 of the most popular formats and why when SearchMonkey launches, it will only index 5. 5 out of 20 listed on the main wiki page and 74 on the Exploratory page.
This means that if you use any of the 94 listed microformats, SearchMonkey will only see 5 of them.
There are also other problems that have been previously noted by others. It is difficult to mix and match different microformats; that imposes a big limitation on layout flexibility. No easy way to validate your work. The use of microformats also raise accessibility concerns.
Therein lies the issue with microformats. Without an underlying abstract data model, validation becomes a bit like standing back looking at a used car, kicking the tyres, concluding "yeah, looks alright", and then handing over the cash – source.
What is a search engine to do?
So as a search engine company what are you going to prefer? Write ONE RDFa parser and take in ALL metadata that is created with RDFa. Or write a new parser for EVERY microformat that is now available plus every new one in the future?
Web scaleable metadata
RDFa is soon to be a W3C standard (or Recommendation as they call them). It has taken a while for all the pieces to come together but anything this important does take time. And with that time comes a very well thought out solution:
- Scaleable – any vocabularies you want. Create your own and go wild!
- Mixable – mix and match any vocabulary you want in any layout you want.
- W3C Standard – the reason that one parser will read any vocabulary, validation is trivial.
- Globally Identifiable – give any thing on your page a URL and it becomes a "living" data point on the Web; easily addressable by anyone.
- Your page becomes a stand-alone linked data client; queryable like a database. This is really cool.
Find out more about RDFa
The RDFa group has just launched a wiki (with a growing body of info) and a mailing list. They can also be contacted on IRC/#swig. Keep checking back as I plan on adding new posts on how to use RDFa in your own web pages.
(image from wikipedia)