Adrian Holovaty on Mashups and Microformats

At the Web Directions South conference, I sat down with Adrian Holovaty, the creator of chicagocrime.org, the very first Google Maps mashup, and co-creator of the Django framework. I was keen to get his unique perspective on the notion of privacy online, the potential for structured data on the Web, and, of course, his latest venture.

SitePoint: Let me start by quoting you. In your presentation, you said, “Structure is all around us; you don’t need to be a freak like me to find it.” Could you elaborate on that?

Basically I think that, as I said in my talk, we’re beginning to expect more browsability. We’re beginning to expect that, if you look at Random Fact X, which is in a sidebar on a given web page, you’ll be able to click that and find more value.

SP: You seem to record a lot of information about yourself — every flight you’ve taken, all the hotels you’ve stayed in. Should we be documenting our lives for the greater good, so that one day somebody can mash it up?

No, no, I definitely wouldn’t say that. I’m not suggesting that we should go about structuring information that we weren’t already collecting. But we should definitely be analysing the information that we are already collecting, and adding structure to that. Because if we’re not collecting information at this point, chances are it’s not being collected for a good reason. Like my hotel stays — there’s a good reason that people don’t collect this information, and that’s because it’s a waste of time.

But if your organisation publishes a newsletter, or your hospital publishes information, and you’re already publishing it, then you should find the structure in it, and encourage browsability.

SP: I would guess that there are 100 times more sets of data behind closed doors than are open. Do you think we’ll ever reach the point where those numbers are reversed, and there’s information that’s available to be played with and combined? Is the world opening up more?

Definitely. And the best way to realise the answer to that question is to look at Facebook and MySpace, and the younger generation, and see how open they are in sharing everything.

SP: So is privacy out the door? Does it exist anymore?

I don’t think it has completely left, but I think its definition has changed, and people’s comfort with their data — personal data that they give out — is going way up. They’re more comfortable with saying “I’m okay with these pictures of me, drunk, off my rocker, being seen by future employers.” And I think that, down the road, it’ll become a non-issue.

We’ll see that, maybe in 10 or 20 years, the person running for President of the United States might have had a Facebook account in the year 2007 with some incriminating photos, but I think that, as opposed to what we might see today, where you’d get criticised for that, in the future, everyone will say, you know, “I did that too.” It’s those very deep implications of having a permanent record of everything online.

SP: So, tell me a little bit about EveryBlock. Why do we need another news service?

I can’t talk much about it, because we haven’t launched yet, and we’re wanting to launch with a bang. But essentially we’re trying to solve a real problem, a problem that real people have, and that problem is that, with so much information out there, “How do I find the information that only applies to my very specific location?” Specifically, news, and information that is news.

I have a very liberal definition of news in this context — it’s not only the traditional types of things that you might find in a newspaper, but also stuff that’s new. For instance, Flickr photos: if someone posted a photo in your neighbourhood, you’d probably want to see it, because you live around the area and you’re curious. So we aim to answer the question: “How can I easily find out what’s happening around me?” Both what is happening, and what has happened.

SP: Coming from the journalism industry, how do you find working in a small startup?

Oh, man! It is unbelievably fun, and I’m having a blast. There are four of us: we sleep in, we work hard, it’s a ton of fun.

SP: Let’s talk about Django a little bit. Who is the framework targeted at? Do I need to know much Python to use it?

Well, Django’s a framework for rapid web app development. You do need to know Python in order to use it. I guess you could use Django never having used Python … but you’ll have to learn it as you learn Django. It’s just like Ruby on Rails: you really do have to learn Ruby in order to use it. It’s a tool for that language.

SP: You mentioned in your bio that you enjoy reverse engineering things.
What’s next? Are you going to pick your iPhone apart?

I guess I’m motivated by identifying things that people haven’t done yet, and doing those. If things are easy, or if other people have done them, it’s not as interesting to me. In terms of what’s next, I love identifying information that’s out there and is very hard to use — poorly navigable web sites, often government web sites — and then making that open, just like on chicagocrime.org, and some of the stuff that I did on washingtonpost.com. So I love opening the data like that.

I also love reverse engineering on a technical level. Like when Google Maps came out, I was in the group of people that was hacking that, and figured out how to put maps on other pages before the API came out, so … I guess it’s a little naughty, but what the heck?

SP: Is that part of the allure?

Yeah, I have to say it is.

SP: Your first mashup, chicagocrime.org, mapped the crimes that were occurring in Chicago. Then in your presentation you mentioned some stuff you were doing with data about soldier fatalities in Iraq … the data you’re mashing up seems to be getting more and more violent. Do you seek out violent data on purpose?

Ha ha! No, no. That’s purely coincidental — I’m not a violent person, or someone who likes violence in movies or anything. That’s just coincidental.

SP: The concept of “structured blogging” was popular a while back. Where do microformats, structured blogging and other structured data fit in?

There’s all sorts of great structure with blogs that people don’t take advantage of, like where you’re posting from. But it depends on the blog. For example, three or four years ago on my own blog I did site reviews of newspaper web sites, where I sort of bashed what they were doing poorly in terms of accessibility and usability and how their markup was really bad.

So a couple of times a week I would post an entry about a particular news web site from across the world — and it would always have the same
structure: it would always be about a particular site, the rating, what they did good, what they did poorly … And I think that bloggers tend to have those little recurring features, and the problem with adding structure to blogs is that it’s just a big box — with any of the blogging software out there, you have a headline, and then a big blob, and that’s where you put your entry.

I’d like to see more people take advantage of adding more meta-data, rather than there just being a big blob. It’s great that the tools have come to a point where people can easily post to blogs, like Blogspot (Blogger), and anyone can post without having to know code or have FTP access or whatever. That’s one step, but we need to go to the next step and have more content-specific publishing systems.

SP: Do you think that it’s the role of HTML to evolve so that it contains more tags and expresses more semantic meaning?

Hmm, no, I think that the structure needs to exist at a lower level.
It needs to be in a database or some custom XML language, and then HTML can be used as the delivery mechanism. But I do like microformats a lot … then again I do have my reservations about microformats.

SP: What are they?

Well, take a look at a format that’s used all over the Web, like hReview or hCalendar. If you want to do something in aggregate, with all of that data, you essentially have to re-implement Googlebot in order to do that. If you want to say “Oh, I’m going to make the definitive site for events, and I’m going to do that by scraping people’s event microformat data”, you have to manually go out to all these places and scrape it yourself. So while it’s cool that it’s decentralised, it’s also prohibitive to people who want to do stuff with it.

On the other hand, if you have an isolated domain, like you just wanted to scrape upcoming.org or one particular site, obviously that’s a lot easier. But if you want to do stuff on a large level, you essentially have to be Technorati or Google.

SP: Thanks very much for your time, I appreciate it. Enjoy the rest of your time in Australia!

Sure, thanks Matthew!