Let’s kick things off with something a bit unusual: a virtual scavenger hunt.
At some point, nearly every web geek gets a chance to hack on some open data, usually from a government source. The buzzword here is “mashup,” but knowing how to find and consume openly available data will remain a valuable skill long after its faddishness ends.
Unfortunately, governments, and especially the US government, are often incredibly awful at providing this data. Sure, it’s available — but you’ve got to find it first.
So this question is all about finding that data. Since I’m most familiar with the USA, this question is USA-specific (but I’d love to see answers to any questions that apply to other nations).
In each case, the answer should be a URL where you can either download the data in question, or at least find a direct link to the data. There may be multiple sources for each, including ones that could be screen-scraped for the data. I’m not looking for those sources, however — just the ones with easily downloadable data in a format that can be easily parsed by a computer (i.e. CSV, XML, plain text). “Friendly” formats, in other words.
So, where can I download data to:
- Analyze the nutritional content of foods?
- Find the population (and other basic demographics) of my city?
- Analyze the latest SEC filings by public companies?
- Look at historical gas prices?
- Look for trends in juvenile arrest rates?
Post your answers into the comments. For extra brownie points, tell us how you located each piece of data — did The Google serve you well, or were you forced to turn elsewhere?
If you really want to stretch your brain, try to write a tool to import each chunk of data into your favorite relational database. There will be a related question in a couple of weeks involving modeling one of these pieces of data, so you overachievers can start thinking about it now…
Good luck, and check back this weekend for the answers.
If you liked this blog, share the love:



November 15th, 2006 at 1:06 am
Nutritional Content DB http://www.ars.usda.gov/Services/docs.htm?docid=13746 found via google for ‘nutritional content of foods’
Census Information (for Orlando Florida, but you can choose any other city, I just like Florida) http://factfinder.census.gov/servlet/QTTable?_bm=y&-context=qt&-qr_name=DEC_1990_STF1_DP1&-ds_name=DEC_1990_STF1_&-CONTEXT=qt&-tree_id=100&-all_geo_types=N&-redoLog=true&-_caller=geoselect&-currentselections=DEC_1990_STF1_DP1&-geo_id=label&-geo_id=16000US121600&-search_results=16000US121600&-format=&-_lang=en found via google for ‘census data download’
Juvenile Arrest Rates http://ojjdp.ncjrs.org/ojstatbb/ezaucr/asp/ucr_display.asp found via google for ‘juvenile arrest rates’
November 15th, 2006 at 1:20 am
I feel so slow now. :( I only just found the Nutritional Content. :P Same URL as you have. There are other sources, however. I took time to read through some things (which slowed me down), and even the USDA get their information from 3+ sources originally.
November 15th, 2006 at 1:46 am
1. http://www.ars.usda.gov/Services/docs.htm?docid=13746
2. http://www.census.gov/popest/datasets.html
1. Note: the “Subcounty population dataset” lists population by city.
3. ftp://ftp.sec.gov/edgar/daily-index/
4. http://www.eia.doe.gov/oil_gas/petroleum/data_publications/wrgp/mogas_history.html
5. http://ojjdp.ncjrs.gov/ojstatbb/dat.html#downloadable
November 15th, 2006 at 3:10 am
Historical Gas Prices:
http://www.eia.doe.gov/oil_gas/petroleum/data_publications/wrgp/mogas_history.html
Select how you want the info grouped and it is delivered in xls.
Found Via Google using: “Historical gas prices”
November 15th, 2006 at 3:41 am
SEC Filings
http://www.sec.gov/cgi-bin/browse-edgar?action=getcurrent
Good times.
November 15th, 2006 at 3:42 am
Whoops, forgot to mention that The Google helped with “latest SEC filings.”
November 15th, 2006 at 3:48 am
I can only have a go at these once every couple hours since I’m at work. As far as the SEC Filings (google “SEC filings”, traversed first hit subcategory…) are concerned, this is what I’ve found thus far:
http://www.sec.gov/cgi-bin/browse-edgar?action=getcurrent
Allows a search of most recent filings, about as real-time as you can get. I have not found any way to gather a full listing of this data without scraping. The following links may assist with this, but my 5 min search break is over. :)
- Link 1
- Link 2
I think these are in the right direction, but again, my time’s up.
http://www.sec.gov/Archives/edgar/xbrlrss.xml
An RSS feed which is updated daily.
http://www.sec.gov/edgar/searchedgar/webusers.htm
Main page that delivered me to the two “main” links from above.
As an aside, when using a search engine, using advanced filters can help quite a bit when you know what form of information you’re looking for, especially if a government organization is most likely involved. With google, for instance, you can specify in the search terms: site:.gov “sec filings” or site:.org “sec filings” — limiting your search results goes a long way in removing unimportant data.
November 15th, 2006 at 7:35 am
Juvenile arrest rates:
http://www.ojp.usdoj.gov/bjs/data/violarr.wk1
Linked from http://www.ojp.usdoj.gov/bjs/dtdata.htm
Probably not the most friendly format though!
November 15th, 2006 at 10:58 am
http://www.sitepoint.com/blogs/2006/11/14/scavenger-hunt/
This page has direct links to most of the resources required for the hunt.
November 16th, 2006 at 12:44 am
You’re good, cranial-bore! :D
November 16th, 2006 at 7:14 am
1. Analyze the nutritional content of foods?
http://www.ars.usda.gov/SP2UserFiles/Place/12355000/apps/fndds1_ascii.exe
This one was the hardest for me to find ’cause I kind of geek out at Food & Nutrition sites.
2. Find the population (and other basic demographics) of my city?
Here’s all of Wisconsin.
http://www.doa.state.wi.us/dir/wisconsin/WIsf3_demo_profiles.xls
My girlfriend works for the city, so finding this one was easy. No The Google necessary.
3. Analyze the latest SEC filings by public companies?
http://www.sec.gov/Archives/edgar/xbrlrss.xml
4. Look at historical gas prices?
http://tonto.eia.doe.gov/oog/ftparea/wogirs/xls/pswrgvwnus.xls
First thing in The Google.
5. Look for trends in juvenile arrest rates?
http://ojjdp.ncjrs.org/ojstatbb/ezaucr/asp/ucr_export.asp?Select_State=0&Select_County=0&rdoData=1c&rdoYear=99&Print=no
Thanks for doing this - I had a blast!
Ruth
November 17th, 2006 at 6:37 am
i didnt look for each item specifically, but this site has always helped when searching for US Government information: http://www.firstgov.gov/