Yahoo! is soon to launch their next generation Web search system dubbed SearchMonkey. This means that content developers have a powerful new tool in their arsenal. Something that was nearly impossible before. Here is a quick preview from Yahoo!
No longer dependant on Google
You no longer have to depend on Google’s good graces (and their smart people) to make sense of the content you have worked hard to create. You can explicitly specify what you meant with no ambiguity.
I was going to write this post in a few days, but a post on OpenBible.info has forced my hand — thanks for that :)
Yahoo! last week announced that it’s going to start indexing semantic data, including support for certain microformats.
Bibleref isn’t one of those microformats. Should Bibleref proponents lobby Yahoo! to index Bibleref, or should Bibleref change its syntax to be compatible with RDFa or another semantic web standard?
….
So what should Bibleref’s proponents do? It’s possible we could convince Yahoo! to index Bibleref, giving it the traction it needs to take off. However, I wouldn’t necessarily expect Yahoo! to do a good job understanding the data, in part because of the looseness of the standard (which I see as a good thing). And if Yahoo! doesn’t understand it well, then search results based on Bibleref won’t be very high quality. But a lot depends on how Yahoo! exposes the data. (And they may not even want to index Bibleref.)
RDFa
Another possibility is to change Bibleref to be compatible with RDFa, an emerging standard that Yahoo! does understand….
They did a better job of explaining it than I would have! The good folks working on Bibleref are now in the situation where I believe many, many of you will be in soon.
The $64,000 question
How do we publish our intelligent information in a format that will be understood by Yahoo! SearchMonkey and other next-gen search engines? How do you get your valuable metadata out there in the new frontier of the Linked-Data Web/Semantic Web?
Currently there are two main options: RDFa and microformats. But of course it is not as simple as that.
The problem with microformats
The main problem with microformats is that each time a new one is created the search indexer needs to develop a custom extractor to make sense of the microformat. That is why Yahoo microsearch is only indexing 3 of the most popular formats and why when SearchMonkey launches, it will only index 5. 5 out of 20 listed on the main wiki page and 74 on the Exploratory page.
This means that if you use any of the 94 listed microformats, SearchMonkey will only see 5 of them.
There are also other problems that have been previously noted by others. It is difficult to mix and match different microformats; that imposes a big limitation on layout flexibility. No easy way to validate your work. The use of microformats also raise accessibility concerns.
Therein lies the issue with microformats. Without an underlying abstract data model, validation becomes a bit like standing back looking at a used car, kicking the tyres, concluding "yeah, looks alright", and then handing over the cash - source.
What is a search engine to do?
So as a search engine company what are you going to prefer? Write ONE RDFa parser and take in ALL metadata that is created with RDFa. Or write a new parser for EVERY microformat that is now available plus every new one in the future?
Web scaleable metadata
RDFa is soon to be a W3C standard (or Recommendation as they call them). It has taken a while for all the pieces to come together but anything this important does take time. And with that time comes a very well thought out solution:
- Scaleable - any vocabularies you want. Create your own and go wild!
- Mixable - mix and match any vocabulary you want in any layout you want.
- W3C Standard - the reason that one parser will read any vocabulary, validation is trivial.
- Globally Identifiable - give any thing on your page a URL and it becomes a "living" data point on the Web; easily addressable by anyone.
- Your page becomes a stand-alone linked data client; queryable like a database. This is really cool.
Find out more about RDFa
The RDFa group has just launched a wiki (with a growing body of info) and a mailing list. They can also be contacted on IRC/#swig. Keep checking back as I plan on adding new posts on how to use RDFa in your own web pages.
(image from wikipedia)





March 16th, 2008 at 10:20 pm
I can think of one other problem.. no one’s worried about it right now, but what happens when metaspam comes?
March 17th, 2008 at 8:18 am
Philip, I wonder about that too. Search engines (in the 1800s I believe) used to use META tags to determine what a page was about. How are these new search techniques going to rank pages that have these semantic relationships?
March 17th, 2008 at 8:02 pm
[…] Peterson gives a good analysis of what Yahoo’s announcement means for sites wanting to get semantic information into search […]
March 17th, 2008 at 9:13 pm
> The main problem with microformats is that each time a new one is created the search indexer needs to develop a custom extractor to make sense of the microformat.
Boo hoo! When it’s as easy as it is with GRDDL, I’m really, really not fussed.
Imagine: I make a new microformat, and I take the time to make a profileTransformation - this problem is lessened dramatically.
March 18th, 2008 at 12:53 am
[…] Blogs has an article which asks and answers the question » Why RDFa is the only Web scaleable metadata format for next-generation search engines. A quote from the article: How do we publish our intelligent information in a format that will be […]
March 18th, 2008 at 3:33 am
Good topic to be discussing these days. Please do continue writing about how the average designer/developer at SitePoint can start implementing RDFa in their work.
March 19th, 2008 at 8:48 am
I whole-heartedly agree with Dan Grossman. I’d love to hear more about it as well! Thanks for all the information you’ve leaked in thus far!
March 19th, 2008 at 6:23 pm
[…] Peterson, on his blog at Sitepoint, has had quite a bit to say about the issue and one of the points he makes is that, if you haven’t already, there […]
March 22nd, 2008 at 12:03 am
It is not *just me* saying this. Yahoo will only support 5 microformats, Firefox 3 [1] will also only support 5 microformats.
The fundamental problem with microformats is that there is no standard model so no standard way to extract them from pages. There is no way around it. And when big players with a lot of resources put their support only behind a small subset it really makes it clear that microformats don’t work on the scale of the global linked Web.
[1] http://developer.mozilla.org/en/docs/Using_microformats
March 22nd, 2008 at 11:07 pm
There are a lot of people thinking about this; here is someone inside Yahoo itself [1].
[1] http://dubinko.info/blog/2008/03/13/the-lowercase-semantic-web-goes-mainstream/
March 24th, 2008 at 8:45 pm
Excellent points on RDFa vs. microformats. I find the arguments on scalability and mixability particularly compelling, as it clearly provides much more flexibility to site owners to tailor their semantic web presence.
Thanks for the comment on my blog — folks who read my post on marketing implications [1] of this move by Yahoo! will find it very interesting.
[1] http://www.chiefmartec.com/2008/03/seo-semantic-we.html
April 1st, 2008 at 7:03 am
[…] really have time at the moment to get into a full-on Microformats chit-chat, I did however find this article on SitePoint authored by David Peterson that goes into some depth as to why it might be better to […]
April 18th, 2008 at 3:07 am
[…] Why RDFa is the only Web scaleable metadata format for next-generation search engines […]
May 7th, 2008 at 6:27 am
[…] Why RDFa is the only Web scaleable metadata format for next-generation search engines — David Peterson, sitepoint Blog. […]
August 29th, 2008 at 2:11 am
[…] rich search you also get enhanced search results. I have blogged about this previously so take a look. It is really cool stuff and I will be discussing it in much more detail over the course of the […]