Episode 4: What’s “normal,” really?By Jacob Kaplan-Moss
Sorry about the missed week, fellow puzzlers — real life, and all that — I’ll try not to let it happen again.
Of course, finding a viable source of data is only the first step; once you’ve figured out what to use, you have to figure out how to use it. Since I’m a certified database geek, the first thing I do once I’ve got some sweet data in hand is start thinking about database design.
When we talk database design, we’re usually talking about formal database normalization, and specifically first, second, and third normal forms. Although I’ll be the first to admit that often formal normalization needs to take a back seat to pragmatic design or performance requirements, we’ll ignore that big caveat this week and plunge ahead.
Here, again, are the five data sources we located in the scavenger hunt:
- Nutritional content of food from the USDA.
- (Links to) population demographics of every major city in the US, courtesy of the US Census Bureau.
- The latest SEC filings (in RSS, no less) straight from the horse’s mouth.
- Historical gas prices, from the Energy Information Administration (which I had never heard of until writing this quiz).
- Juvenile arrest rates from the Office of Juvenile Justice and Delinquency Prevention (part of the Department of Justice).
So, which normal form are each of these sources in (and why)?
We’ll discuss the answers and a bit more about the implications of database normalization this weekend.
For an extra challenge, pick one of the sources and define a fully normalized (i.e. 3NF) schema for it. There’s not in any way a “right” answer here, but if anyone’s brave enough to post their schemas, I’ll critique ’em when we go over the answers.
Got a question of your own?
As always, if you’ve got a question, puzzle, or challenge that you think would make a good question for this quiz, email me at jacob -at- jacobian.org.