<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	>

<channel>
	<title>SitePoint &#187; Web Developer Quiz</title>
	<atom:link href="http://www.sitepoint.com/blogs/category/tech/web-developer-quiz/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.sitepoint.com/blogs</link>
	<description>News, opinion, and fresh thinking for web developers and designers. The official podcast of sitepoint.com.</description>
	<pubDate>Sun, 05 Jul 2009 11:48:35 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.7.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Answers to Episode 4 (&#8221;What&#8217;s &#8216;normal&#8217;, really?&#8221;)</title>
		<link>http://www.sitepoint.com/blogs/2006/12/21/answers-to-episode-4-whats-normal-really/</link>
		<comments>http://www.sitepoint.com/blogs/2006/12/21/answers-to-episode-4-whats-normal-really/#comments</comments>
		<pubDate>Wed, 20 Dec 2006 16:41:26 +0000</pubDate>
		<dc:creator>jacob-kaplan-moss</dc:creator>
		
		<category><![CDATA[Web Developer Quiz]]></category>

		<guid isPermaLink="false">http://www.sitepoint.com/blogs/2006/12/21/answers-to-episode-4-whats-normal-really/</guid>
		<description><![CDATA[Well that was a raging success&#8230; not! 
Apparently database normalization isn&#8217;t something that web developers find all that interesting. (But thanks to malikyte and xhtmlcoder for keeping the question from being a complete ghost town!)
That&#8217;s a shame, though &#8212; there are all sorts of pragmatic reasons behind good data design. To name just a few: [...]]]></description>
			<content:encoded><![CDATA[<p>Well <a href="http://www.sitepoint.com/blogs/2006/12/14/episode-4-whats-normal-really/">that</a> was a raging success&#8230; <strong>not!</strong> </p>
<p>Apparently database normalization isn&#8217;t something that web developers find all that interesting. (But thanks to <cite>malikyte</cite> and <cite>xhtmlcoder</cite> for keeping the question from being a complete ghost town!)</p>
<p>That&#8217;s a shame, though &#8212; there are all sorts of pragmatic reasons behind good data design. To name just a few: properly designed tables often perform better than their de-normalized brethren, normalized data is <em>much</em> easier to aggregate successfully, and (most importantly) properly designed tables are much easier to understand.</p>
<p>That last one&#8217;s really the crux behind normalizing tables. Remember &#8212; computers don&#8217;t care if we write good code; when we write good code, it&#8217;s so that future developers won&#8217;t curse our names. Data normalization falls into the same future-proofing category.</p>
<div id="adz" class="vertical"></div><p>Anyway, though: on to the answers. I&#8217;ll be brief, I promise.</p>
<ol>
<li>The FDA&#8217;s nutritional content database is &#8212; to my utter surprise &#8212; actually 3NF (everything has a primary key, and every piece of data appears to be atomic).  I have some quibbles with a couple of the design choices, but they&#8217;re actually pretty minor.  It&#8217;s pretty remarkable when you come across data this clean out of the box.</li>
<li>Although the population demographic data is fairly well designed (and easy to munge into better forms), it doesn&#8217;t even achieve 1NF: records lack primary keys.  This is usually the case with public data, and it stinks; it makes tracking changes from version to version extremely difficult.</li>
<li>
<p>The SEC filings were a trick question. They&#8217;re in a XML dialect, so normal forms don&#8217;t apply.</p>
<p>I think it&#8217;s important to notice how different formats change the way we can produce and consume data; the SEC data is a pretty good example of well done XML, but it would be pretty difficult trying to push this data into a database in any sort of structured way.</p>
<p>If I wanted to build a site around this stuff, I&#8217;d likely use something like Berkeley DB for XML instead of a relational database.</p>
</li>
<li>The gas price data, though crammed into a Excel sheet used more for presentation than data management, is actually in 3NF (if only because it&#8217;s pre-aggregated data).  The data of the measurement is the primary key, and all columns are singularly dependent on the primary key (i.e. price is a function of date, and nothing more).</li>
<li>The Juvenile Arrest Rate data is, like the gas prices, nominally 3NF data crammed into Excel.</li>
</ol>
<p></p>
<h4>Coming up&#8230;</h4>
<p>Tune in tomorrow for a special super-difficult (I hope) challenge to keep us all occupied over the holidays.</p>
<p>As always, if youâve got a question, puzzle, or challenge that you think would make a good question for this quiz, email me at <i>jacob -at- jacobian.org</i>.</p>
<script src="http://ads.aws.sitepoint.com/adjs.php?region=136&amp;did=adz&amp;adtype=vertical" type="text/javascript"></script>]]></content:encoded>
			<wfw:commentRss>http://www.sitepoint.com/blogs/2006/12/21/answers-to-episode-4-whats-normal-really/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Episode 4: What&#8217;s &#8220;normal,&#8221; really?</title>
		<link>http://www.sitepoint.com/blogs/2006/12/14/episode-4-whats-normal-really/</link>
		<comments>http://www.sitepoint.com/blogs/2006/12/14/episode-4-whats-normal-really/#comments</comments>
		<pubDate>Thu, 14 Dec 2006 13:25:55 +0000</pubDate>
		<dc:creator>jacob-kaplan-moss</dc:creator>
		
		<category><![CDATA[Web Developer Quiz]]></category>

		<guid isPermaLink="false">http://www.sitepoint.com/blogs/2006/12/14/episode-4-whats-normal-really/</guid>
		<description><![CDATA[A few weeks ago, I posted a <a href="http://www.sitepoint.com/blogs/2006/11/14/scavenger-hunt/">scavenger hunt</a> for public data (<a href="http://www.sitepoint.com/blogs/2006/11/20/episode-1-answers/">answers</a>); today we'll return to dealing with that data.]]></description>
			<content:encoded><![CDATA[<p>Sorry about the missed week, fellow puzzlers &#8212; real life, and all that &#8212; I&#8217;ll try not to let it happen again.</p>
<p>Anyway, let&#8217;s get right to this week&#8217;s question. A few weeks ago, I posted a <a href="http://www.sitepoint.com/blogs/2006/11/14/scavenger-hunt/">scavenger hunt</a> for public data (<a href="http://www.sitepoint.com/blogs/2006/11/20/episode-1-answers/">answers</a>); today we&#8217;ll return to dealing with that data.</p>
<p>Of course, finding a viable source of data is only the first step; once you&#8217;ve figured out what to use, you have to figure out how to use it.  Since I&#8217;m a certified database geek, the first thing I do once I&#8217;ve got some sweet data in hand is start thinking about database design.</p>
<p>When we talk database design, we&#8217;re usually talking about <a href="http://en.wikipedia.org/wiki/Database_normalization">formal database normalization</a>, and specifically <a href="http://en.wikipedia.org/wiki/1NF">first</a>, <a href="http://en.wikipedia.org/wiki/2NF">second</a>, and <a href="http://en.wikipedia.org/wiki/3NF">third</a> normal forms.  Although I&#8217;ll be the first to admit that often formal normalization needs to take a back seat to pragmatic design or performance requirements, we&#8217;ll ignore that big caveat this week and plunge ahead.</p>
<div id="adz" class="vertical"></div><p>Here, again, are the five data sources we located in the scavenger hunt:</p>
<ol>
<li><a href="http://www.ars.usda.gov/Services/docs.htm?docid=13746">Nutritional content of food</a> from the USDA.</li>
<li>(Links to) <a href="http://www2.census.gov/census_2000/datasets/100_and_sample_profile/">population demographics of every major city in the US</a>, courtesy of the US Census Bureau.</li>
<li>The latest <a href="http://sec.gov/Archives/edgar/xbrlrss.xml">SEC filings</a> (in RSS, no less) straight from the horse’s mouth.</li>
<li><a href="4.%20http://tonto.eia.doe.gov/dnav/pet/pet_pri_gnd_dcus_nus_w.htm">Historical gas prices</a>, from the Energy Information Administration (which I had never heard of until writing this quiz).</li>
<li><a href="http://www.ojjdp.ncjrs.org/ojstatbb/crime/JAR.asp">Juvenile arrest rates</a> from the Office of Juvenile Justice and Delinquency Prevention (part of the Department of Justice).</li>
</ol>
<p>So, <strong>which normal form are each of these sources in (and why)</strong>?</p>
<p>We&#8217;ll discuss the answers and a bit more about the implications of database normalization this weekend.  </p>
<h4>Bonus challenge!</h4>
<p>For an extra challenge, pick one of the sources and define a fully normalized (i.e. 3NF) schema for it.   There&#8217;s not in any way a &#8220;right&#8221; answer here, but if anyone&#8217;s brave enough to post their schemas, I&#8217;ll critique &#8216;em when we go over the answers.</p>
<h4>Got a question of your own?</h4>
<p>As always, if you&#8217;ve got a question, puzzle, or challenge that you think would make a good question for this quiz, email me at <i>jacob -at- jacobian.org</i>.</p>
<script src="http://ads.aws.sitepoint.com/adjs.php?region=136&amp;did=adz&amp;adtype=vertical" type="text/javascript"></script>]]></content:encoded>
			<wfw:commentRss>http://www.sitepoint.com/blogs/2006/12/14/episode-4-whats-normal-really/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Answers to Episode 3 (&#8221;One of these things&#8230;&#8221;)</title>
		<link>http://www.sitepoint.com/blogs/2006/12/05/answers-to-episode-3-one-of-these-things/</link>
		<comments>http://www.sitepoint.com/blogs/2006/12/05/answers-to-episode-3-one-of-these-things/#comments</comments>
		<pubDate>Mon, 04 Dec 2006 16:47:11 +0000</pubDate>
		<dc:creator>jacob-kaplan-moss</dc:creator>
		
		<category><![CDATA[Web Developer Quiz]]></category>

		<guid isPermaLink="false">http://www.sitepoint.com/blogs/2006/12/05/answers-to-episode-3-one-of-these-things/</guid>
		<description><![CDATA[Answers to last week's questions.]]></description>
			<content:encoded><![CDATA[<p>Well, I certainly thought <a href="http://www.sitepoint.com/blogs/2006/11/29/episode-3-one-of-these-things/">last week&#8217;s question</a> was incredibly fun.  If you missed it, I posed five &#8220;find the odd man out&#8221; questions which turned out to be quite difficult.</p>
<p>Nobody who attempted to answer got all five right.  However, I may have left just a little to much ambiguity in some of the questions for the &#8220;right&#8221; answer to be findable&#8230; Either way, I&#8217;ll dive in and explain which answers I was looking for. Let me know in comments if I&#8217;m totally out of my mind :)</p>
<h4>1. Specifications</h4>
<p>Of the four specifications given (WSDL, APP, RDF, and WS-Policy), only RDF specifies a non-XML-based data representation.  Yes, RDF <em>can</em> be represented in XML, but it also has alternative formats; the other three don&#8217;t.  </p>
<h4>2. HTTP methods</h4>
<p><cite>kasimir</cite> <a href="http://www.sitepoint.com/blogs/2006/11/29/episode-3-one-of-these-things/#comment-110705">got it</a>: &#8220;POST is not idempotent&#8221;.</p>
<div id="adz" class="vertical"></div><p>Idempotence is actually an incredibly important concept in web development, but rather than shoot my mouth off about it, let me point you to Wikipedia&#8217;s <a href="http://en.wikipedia.org/wiki/Idempotence_%28computer_science%29#WWW_behavior">words on the subject</a>, which are quite good.</p>
<h4>3. MD5 hashes</h4>
<p><cite>Mindaugas</cite> <a href="http://www.sitepoint.com/blogs/2006/11/29/episode-3-one-of-these-things/#comment-110777">got it</a>: B is an MD5 hash of the empty string.</p>
<p>This one was a little bit of a &#8220;gotcha&#8221; question; I usually try to avoid trick questions, but I just couldn&#8217;t resist here.</p>
<h4>4. Programming langauges</h4>
<p>Also a bit of a trick question: the answer is based on some knowledge of the history of the languages, not on features languages themselves.</p>
<p>The answer? Python is the only language not named after a (real) person.</p>
<h4>5. HTML 4 elements</h4>
<p><cite>boomsb</cite> <a href="http://www.sitepoint.com/blogs/2006/11/29/episode-3-one-of-these-things/#comment-110053">got it first</a>: <code>&lt;U&gt;</code> is deprecated.</p>
<p>There&#8217;s a little bit of a joke in this one, though: even though <code>&lt;U&gt;</code> is officially deprecated, it&#8217;s nevertheless supported by every browser under the (virtual) sun, while &lt;Q&gt; &#8212; officially part of HTML &#8212; is not supported by Explorer.</p>
<p>Isn&#8217;t this a fun world to code for?</p>
<h4>Got a question of your own?</h4>
<p>If you&#8217;ve got a question, puzzle, or challenge that you think would make a good question for this quiz, email me at <i>jacob -at- jacobian.org</i>. If I use your question in a future quiz, I&#8217;ll even send you a nice little present&#8230;</p>
<h4>Tomorrow</h4>
<p>Tune in tomorrow for the next question.  I think I&#8217;m going to try something a bit more open-ended this time: a question about data modeling.  Be sure to check it out.</p>
<script src="http://ads.aws.sitepoint.com/adjs.php?region=136&amp;did=adz&amp;adtype=vertical" type="text/javascript"></script>]]></content:encoded>
			<wfw:commentRss>http://www.sitepoint.com/blogs/2006/12/05/answers-to-episode-3-one-of-these-things/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Episode 3: &#8220;One of these things&#8230;&#8221;</title>
		<link>http://www.sitepoint.com/blogs/2006/11/29/episode-3-one-of-these-things/</link>
		<comments>http://www.sitepoint.com/blogs/2006/11/29/episode-3-one-of-these-things/#comments</comments>
		<pubDate>Tue, 28 Nov 2006 21:02:24 +0000</pubDate>
		<dc:creator>jacob-kaplan-moss</dc:creator>
		
		<category><![CDATA[Web Developer Quiz]]></category>

		<guid isPermaLink="false">http://www.sitepoint.com/blogs/?p=1797</guid>
		<description><![CDATA[This week's question: "one of these things is not like the other..."]]></description>
			<content:encoded><![CDATA[<p>This week&#8217;s question is a bit more straightforward that the previous ones. There&#8217;s less of a &#8220;point&#8221; to these questions; they&#8217;re more of a trivia contest.  I think they&#8217;re fun &#8212; and tricky &#8212; so let&#8217;s see how it goes.</p>
<p>In &#8220;one of these things is not like the other&#8221; style, for each group below tell me which item doesn&#8217;t belong, and (more importantly) why:</p>
<h4>1. Specifications:</h4>
<ol style="list-style-type: upper-alpha; margin: 1em 0;">
<li><a href="http://en.wikipedia.org/wiki/WSDL">WSDL</a></li>
<li><a href="http://en.wikipedia.org/wiki/Atom_Publishing_Protocol">Atom Publishing Protocol</a></li>
<li><a href="http://en.wikipedia.org/wiki/Resource_Description_Framework">RDF</a></li>
<li><a href="http://en.wikipedia.org/wiki/Ws-policy">WS-Policy</a></li>
</ol>
<div id="adz" class="horizontal"></div><h4>2. HTTP methods:</h4>
<ol style="list-style-type: upper-alpha; margin: 1em 0;">
<li><code>GET</code></li>
<li><code>PUT</code></li>
<li><code>POST</code></li>
<li><code>HEAD</code></li>
</ol>
<h4>3. MD5 hashes:</h4>
<ol style="list-style-type: upper-alpha; margin: 1em 0;">
<li><code>f97c5d29941bfb1b2fdab0874906ab82</code></li>
<li><code>d41d8cd98f00b204e9800998ecf8427e</code></li>
<li><code>35d6d33467aae9a2e3dccb4b6b027878</code></li>
<li><code>0495651fa03c897470784990f33d86cd</code></li>
</ol>
<h4>4. Programming languages:</h4>
<ol style="list-style-type: upper-alpha; margin: 1em 0;">
<li><a href="http://en.wikipedia.org/wiki/Erlang_programming_language">Erlang</a></li>
<li><a href="http://en.wikipedia.org/wiki/Ada_%28programming_language%29">Ada</a></li>
<li><a href="http://en.wikipedia.org/wiki/Haskell_%28programming_language%29">Haskell</a></li>
<li><a href="http://en.wikipedia.org/wiki/Python_%28programming_language%29">Python</a></li>
</ol>
<h4>5. HTML 4 elements:</h4>
<ol style="list-style-type: upper-alpha; margin: 1em 0;">
<li><code>&lt;Q&gt;</code></li>
<li><code>&lt;U&gt;</code></li>
<li><code>&lt;I&gt;</code></li>
<li><code>&lt;A&gt;</code></li>
</ol>
<p>As usual, tune in this weekend for the answers.</p>
<h4>Got a question of your own?</h4>
<p>If you&#8217;ve got a question, puzzle, or challenge that you think would make a good question for this quiz, email me at <i>jacob -at- jacobian.org</i>.  If I use your question in a future quiz, I&#8217;ll even send you a nice little present&#8230;</p>
<script src="http://ads.aws.sitepoint.com/adjs.php?region=137&amp;did=adz&amp;adtype=horizontal" type="text/javascript"></script>]]></content:encoded>
			<wfw:commentRss>http://www.sitepoint.com/blogs/2006/11/29/episode-3-one-of-these-things/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Answers to Episode 2 (Real-life regular expressions)</title>
		<link>http://www.sitepoint.com/blogs/2006/11/28/answers-to-episode-2-real-life-regular-expressions/</link>
		<comments>http://www.sitepoint.com/blogs/2006/11/28/answers-to-episode-2-real-life-regular-expressions/#comments</comments>
		<pubDate>Mon, 27 Nov 2006 21:03:40 +0000</pubDate>
		<dc:creator>jacob-kaplan-moss</dc:creator>
		
		<category><![CDATA[Web Developer Quiz]]></category>

		<guid isPermaLink="false">http://www.sitepoint.com/blogs/?p=1794</guid>
		<description><![CDATA[If you missed it, last week's challenge dealt with <a href="http://www.sitepoint.com/blogs/2006/11/22/episode-2-real-world-regular-expressions/">deciphering regular expressions</a> and finding subtle bugs within 'em.

As with last week, before getting to the actual answers please indulge while I pontificate a bit.]]></description>
			<content:encoded><![CDATA[<p>Yeah, I&#8217;m a little late getting these answers posted.  Sorry!</p>
<p>If you missed it, last week&#8217;s challenge dealt with <a href="http://www.sitepoint.com/blogs/2006/11/22/episode-2-real-world-regular-expressions/">deciphering regular expressions</a> and finding subtle bugs within &#8216;em.</p>
<p>As with last week, before getting to the actual answers please indulge while I pontificate a bit:</p>
<p>Hopefully it&#8217;s pretty obvious that regular expressions are a double-edged sword. Sure, deciphering them makes a fun quiz, but imagine running across these monsters in code and trying to figure out what they do&#8230; not fun.</p>
<p>Fortunately, nearly every regex implementation has a &#8220;verbose&#8221; mode that allows you to embed comments inside regular expressions (n most languages this is the <code>x</code> flag). For the sake of those who must read your code, please use the verbose mode!</p>
<p>OK, on to the answers:</p>
<h4>1. <code>[A-PR-Y0-9]{3}-[A-PR-Y0-9]{3}-[A-PR-Y0-9]{4}</code></h4>
<p>This is a <a href="http://en.wikipedia.org/wiki/Phone_number">US phone number</a>, including ones that use letters (i.e. <code>831-555-CODE</code>).  Rewritten in verbose mode, it makes a lot more sense:</p>
<pre>
  [A-PR-Y0-9]{3}  # Area code prefix
  -
  [A-PR-Y0-9]{3}  # 3-digit exchange
  -
  [A-PR-Y0-9]{4}  # 4-digit suffix
</pre>
<p><cite>birman</cite> had a nice <a href="http://www.sitepoint.com/blogs/2006/11/22/episode-2-real-world-regular-expressions/#comment-101984">roundup of the problems with this pattern</a>:</p>
<blockquote><p>[It] doesn&#8217;t account for a preceding 1, if the area code is in parenthesis, if the digit groups are separated by a dot or space instead of a dash, or the fact that cell phones have Q and Z on them. It also doesn&#8217;t make sure the group is isolated, and not part of something like 1234888-234-123456123.</p></blockquote>
<p>That last point &#8212; the isolation error &#8212; is a <em>very</em> common error when writing regular expressions.</p>
<h4>2. <code>&amp;(?!(\w+|#\d+);)</code></h4>
<div id="adz" class="vertical"></div><p>This is not, as most people thought, a mistaken attempt to match <a href="http://en.wikipedia.org/wiki/HTML_entity">HTML entities</a>. It&#8217;s actually a pattern that will match ampersands in HTML that are <em>not</em> part of entities (it&#8217;s taken from Django&#8217;s <a href="http://www.djangoproject.com/documentation/templates/#fix_ampersands">fix_ampersands</a> template filter).</p>
<p>Here&#8217;s the verbose mode:</p>
<pre>
  &amp;     # Match an ampersand...
  (?!       # ... that is *not* followed by...
    (
      \w+   # ... word characters...
      |     # ... or...
      \#\d+ # ... numeric entity symbols...
    )
    ;       # ... and a semi-colon.
  )
</pre>
<p>The &#8220;problem&#8221; with this pattern is pretty subtle: it matches HTML entities that are well-formed by still invalid (e.g. <code>&amp;#ggxy;</code>).  So as a way of finding unencoded ampersands it&#8217;s just fine, but if you wanted to use it as part of an HTML validator, it would be unacceptable.</p>
<h4>3. <code>(-?(?:0|[1-9]\d*))(\.\d+)?([eE][-+]?\d+)?</code></h4>
<p>Most readers got this one; it&#8217;s a <a href="http://en.wikipedia.org/wiki/Scientific_notation">IEEE floating point number</a>, with optional exponent. In verbose mode:</p>
<pre>
  (             # The non-fractional part of the base
    -?            # could be a leading negative sign
    (?:           # Non-matching group...
      0|[1-9]\d*  # 0, or multiple digits
    )
  )
  (\.\d+)?      # Decimal point and fractional part of the base
  (             # Exponent
    [eE]          # \
    [-+]?         #  > "e", plus or minus, exponent.
    \d+           # /
  )?
</pre>
<p>Some readers thought the <code>\d</code> in the base part was a bug; it&#8217;s not, actually &#8212; that expression matches either <code>0</code>, or a number that starts with 1-9 and then contains any digits.</p>
<p>The actual bug is that this pattern matches non-normalized numbers (i.e. <code>123.45e3</code>, which should more properly be written <code>1.2345e5</code>).</p>
<h4>4. <code>([\da-f]{2}:){5}([\da-f]{2})</code></h4>
<p>Nearly everyone got this one: it&#8217;s a <a href="http://en.wikipedia.org/wiki/MAC_Address">MAC address</a>:</p>
<pre>
  ([\da-f]{2}:){5}  # Two hex digits followed by a colon, x5
  ([\da-f]{2})      # Two hex digits to end.
</pre>
<p>As <cite>birman</cite> <a href="http://www.sitepoint.com/blogs/2006/11/22/episode-2-real-world-regular-expressions/#comment-101984">noted</a>, this pattern fails to match a few other forms allowed for MAC addresses; they can be written with hyphens (<code>12-34-56-78-9A-BC</code>), or as dotted quads (<code>1234.5678.9ABC</code>).</p>
<h4>5. <code>&lt;[^&gt;]*?&gt;</code></h4>
<p>This one also seemed to be easy for most readers; it matches any <a href="http://en.wikipedia.org/wiki/SGML">SGML</a> tag.  In verbose syntax:</p>
<pre>
  &lt;        # Atart the tag
  [^&gt;]*?   # Any non-gt character
  &gt;        # End the tag
</pre>
<p>The &#8220;bug&#8221; in this one is a little more abstract: malformed SGML/HTML will severely muck it up.  I&#8217;ll leave finding such code an exercise for the reader, though.</p>
<h4>Next time</h4>
<p>Tune in tomorrow for the next installment of the quiz.  This week&#8217;s question will be a &#8220;things that every web developer should know&#8221; quiz; I think it&#8217;s a lot of fun.</p>
<p>See you tomorrow!</p>
<script src="http://ads.aws.sitepoint.com/adjs.php?region=136&amp;did=adz&amp;adtype=vertical" type="text/javascript"></script>]]></content:encoded>
			<wfw:commentRss>http://www.sitepoint.com/blogs/2006/11/28/answers-to-episode-2-real-life-regular-expressions/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Episode 2: Real-world regular expressions</title>
		<link>http://www.sitepoint.com/blogs/2006/11/22/episode-2-real-world-regular-expressions/</link>
		<comments>http://www.sitepoint.com/blogs/2006/11/22/episode-2-real-world-regular-expressions/#comments</comments>
		<pubDate>Tue, 21 Nov 2006 16:43:04 +0000</pubDate>
		<dc:creator>jacob-kaplan-moss</dc:creator>
		
		<category><![CDATA[Web Developer Quiz]]></category>

		<guid isPermaLink="false">http://www.sitepoint.com/blogs/2006/11/22/episode-2-real-world-regular-expressions/</guid>
		<description><![CDATA[If you know how -- and when -- and why -- to use regular expressions, they're indispensable.  So this week, regular expressions will be our theme.]]></description>
			<content:encoded><![CDATA[<p>Let&#8217;s get this out there right off the bat: I love regular expressions.  Really, I do &#8212; they&#8217;re the Swiss Army Knife of text processing, and no respecting developer can go long without needing &#8216;em.</p>
<p>Of course, we all also know how dangerous they can be.  As always, with great power comes great responsibility.</p>
<p>Still, if you know how &#8212; and when &#8212; and why &#8212; to use regular expressions, they&#8217;re indispensable.  So this week, regular expressions will be our theme.</p>
<p>Below are five regular expressions.  Each one of them matches a real-world string; that is, a semi-structured piece of text you might want to pull out of a greater document.  Here&#8217;s an example question to give you an idea what I mean:</p>
<div id="adz" class="horizontal"></div><ol start="0">
<li><code>[0-9]{5}</code></li>
</ol>
<p>This, of course, is a US ZIP code.</p>
<p>So, what &#8220;things&#8221; do these regular expressions match?  We&#8217;ll assume for this quiz that the regex engine is running in case-insensitive mode:</p>
<ol>
<li><code>[A-PR-Y0-9]{3}-[A-PR-Y0-9]{3}-[A-PR-Y0-9]{4}</code></li>
<li><code>&amp;(?!(\w+|#\d+);)</code></li>
<li><code>(-?(?:0|[1-9]\d*))(\.\d+)?([eE][-+]?\d+)?</code></li>
<li><code>([\da-f]{2}:){5}([\da-f]{2})</code></li>
<li><code>&lt;[^&gt;]*?></code></li>
</ol>
<p>Of course, since we&#8217;re dealing with regular expressions here, I&#8217;d be amiss if I didn&#8217;t give you two problems for the price of one.</p>
<p>In each case, the regular expression has something wrong with it.  For example, the ZIP code regex above doesn&#8217;t correctly match the ZIP+4 format (i.e. 66044-0034) that&#8217;s used for many addresses these days.</p>
<p>So, for part two, what&#8217;s wrong with the rest of &#8216;em?</p>
<p>Enjoy your Thanksgiving belly-stuffing, and tune in over the weekend for the answers.</p>
<script src="http://ads.aws.sitepoint.com/adjs.php?region=137&amp;did=adz&amp;adtype=horizontal" type="text/javascript"></script>]]></content:encoded>
			<wfw:commentRss>http://www.sitepoint.com/blogs/2006/11/22/episode-2-real-world-regular-expressions/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Answers to Episode 1 (Scavenger Hunt)</title>
		<link>http://www.sitepoint.com/blogs/2006/11/20/episode-1-answers/</link>
		<comments>http://www.sitepoint.com/blogs/2006/11/20/episode-1-answers/#comments</comments>
		<pubDate>Mon, 20 Nov 2006 03:40:03 +0000</pubDate>
		<dc:creator>jacob-kaplan-moss</dc:creator>
		
		<category><![CDATA[Web Developer Quiz]]></category>

		<guid isPermaLink="false">http://www.sitepoint.com/blogs/?p=1789</guid>
		<description><![CDATA[If you missed it, this week's challenge deals with <a href="http://www.sitepoint.com/blogs/2006/11/14/scavenger-hunt/">finding computer-readable public data resources</a>.  Before getting to the answers, though, let's talk a little about technique.
]]></description>
			<content:encoded><![CDATA[<p>Welcome back, scavengers!</p>
<p>If you missed it, this week&#8217;s challenge deals with <a href="http://www.sitepoint.com/blogs/2006/11/14/scavenger-hunt/">finding computer-readable public data resources</a>.  Before getting to the answers, though, let&#8217;s talk a little about technique.</p>
<h4>Finding public data</h4>
<p>By law (in the US), much of the data produced by government agencies must be made available publicly.  As you might expect, however, this is often the last thing an acronym&#8217;d agency wants to think about.  Thus, even when data is made available on the web, it&#8217;s often only provided in formats that are difficult to parse on websites that are difficult to find.</p>
<div id="adz" class="vertical"></div><p>Google does a pretty good job of penetrating this maze of government websites.  Most of those who commented on the original question were able to find at least a few sources using Google. For me, at least, a good deal of poking around and trying searches with different keywords was required.</p>
<p>Once at the right place, most people had no trouble finding data in a form at least nominally parseable.  That&#8217;s a good sign; in this age of Microsoft Office, I often have to fight IT departments to get access to data in a format suitable for parsing into a database.  I glad to see that the people responding to my question have a good grasp of what constitutes a friendly format.</p>
<p>A few readers had some nice tips for finding government data:</p>
<ul>
<li><cite>malikyte</cite> <a href="http://www.sitepoint.com/blogs/2006/11/14/scavenger-hunt/#comment-93895">pointed out out</a> that &#8220;advanced filters can help quite a bit when you know what form of information youâre looking for, especially if a government organization is most likely involved. With [G]oogle, for instance, you can specify in the search terms: <b>site:.gov &#8220;sec filings&#8221;</b> or <b>site:.org &#8220;sec filings&#8221;</b> &#8212; limiting your search results goes a long way in removing unimportant data.&#8221;  I hadn&#8217;t realized that Google&#8217;s <b>site:</b> operator could be used on TLDs; thanks!</li>
<li><cite>WindUpDoll</cite> <a href="http://www.sitepoint.com/blogs/2006/11/14/scavenger-hunt/#comment-95309">easily found demographics</a> for Wisconsin through her girlfriend who works for the city.  I don&#8217;t in any way consider this cheating; nearly all of the cool work that we do <a href="http://ljworld.com/">at work</a> is mad possible by an inside connection. If you&#8217;re in the business of dealing with public data, friends on the inside are key.</li>
<li><cite>dmbfansim</cite> <a href="http://www.sitepoint.com/blogs/2006/11/14/scavenger-hunt/#comment-96401">mentioned</a> the redundantly-named-yet-useful <a href="http://www.firstgov.gov/">FirstGov.gov</a>, &#8220;The U.S. Government&#8217;s Official Web Portal&#8221;.  More specifically, the <a href="http://www.firstgov.gov/Topics/Reference_Shelf.shtml">reference center</a> is an invaluable resource.</li>
</ul>
<p>Finally, a wonderful clearinghouse for government data is <a href="http://www.fedstats.gov/">FedStats.gov</a>; I found the questions for this quiz starting at that site.</p>
<h4>The answers</h4>
<p>Right, enough dallying; here are the answers.  In some cases there were multiple sources found (by readers or by me); I&#8217;ve only provided one below:</p>
<ol>
<li><a href="http://www.ars.usda.gov/Services/docs.htm?docid=13746">Nutritional content of food</a> from the USDA.</li>
<li>(Links to) <a href="http://www2.census.gov/census_2000/datasets/100_and_sample_profile/">population demographics of every major city in the US</a>, courtesy of the US Census Bureau.</li>
<li>The latest <a href="http://sec.gov/Archives/edgar/xbrlrss.xml">SEC filings</a> (in RSS, no less) straight from the horse&#8217;s mouth.</li>
<li><a href="4. http://tonto.eia.doe.gov/dnav/pet/pet_pri_gnd_dcus_nus_w.htm">Historical gas prices</a>, from the Energy Information Administration (which I had never heard of until writing this quiz).</li>
<li><a href="http://www.ojjdp.ncjrs.org/ojstatbb/crime/JAR.asp">Juvenile arrest rates</a> from the Office of Juvenile Justice and Delinquency Prevention (part of the Department of Justice).</li>
</ol>
<p>Was it good for you, too?</p>
<h4>Next time&#8230;</h4>
<p>Come Tuesday, we&#8217;ll tackle a tool that&#8217;s perhaps the most powerful text-processing engine known to man: regular expressions.  Now you have two problems.</p>
<p>See you then.</p>
<script src="http://ads.aws.sitepoint.com/adjs.php?region=136&amp;did=adz&amp;adtype=vertical" type="text/javascript"></script>]]></content:encoded>
			<wfw:commentRss>http://www.sitepoint.com/blogs/2006/11/20/episode-1-answers/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Episode 1: Scavenger hunt!</title>
		<link>http://www.sitepoint.com/blogs/2006/11/14/scavenger-hunt/</link>
		<comments>http://www.sitepoint.com/blogs/2006/11/14/scavenger-hunt/#comments</comments>
		<pubDate>Tue, 14 Nov 2006 00:52:00 +0000</pubDate>
		<dc:creator>jacob-kaplan-moss</dc:creator>
		
		<category><![CDATA[Web Developer Quiz]]></category>

		<guid isPermaLink="false">http://www.sitepoint.com/blogs/?p=1780</guid>
		<description><![CDATA[Let's kick things off with something a bit unusual: a virtual scavenger hunt.

At some point, nearly every web geek gets a chance to hack on some open data, usually from a government source. The buzzword here is "mashup," but knowing how to find and consume openly available data will remain a valuable skill long after its faddishness ends.]]></description>
			<content:encoded><![CDATA[<p>Let&#8217;s kick things off with something a bit unusual: a virtual scavenger hunt.</p>
<p>At some point, nearly every web geek gets a chance to hack on some open data, usually from a government source. The buzzword here is &#8220;mashup,&#8221; but knowing how to find and consume openly available data will remain a valuable skill long after its faddishness ends.</p>
<p>Unfortunately, governments, and especially the US government, are often incredibly awful at providing this data. Sure, it&#8217;s available &#8212; but you&#8217;ve got to find it first.</p>
<p>So this question is all about finding that data. Since I&#8217;m most familiar with the USA, this question is USA-specific (but I&#8217;d love to see answers to any questions that apply to other nations).</p>
<div id="adz" class="vertical"></div><p>In each case, the answer should be a URL where you can either download the data in question, or at least find a direct link to the data. There may be multiple sources for each, including ones that could be screen-scraped for the data. I&#8217;m <strong>not</strong> looking for those sources, however &#8212; just the ones with easily downloadable data in a format that can be easily parsed by a computer (i.e. CSV, XML, plain text). &#8220;Friendly&#8221; formats, in other words.</p>
<p>So, where can I download data to:</p>
<ol>
<li>Analyze the nutritional content of foods?</li>
<li>Find the population (and other basic demographics) of my city?</li>
<li>Analyze the latest SEC filings by public companies?</li>
<li>Look at historical gas prices?</li>
<li>Look for trends in juvenile arrest rates?</li>
</ol>
<p>Post your answers into the comments. For extra brownie points, tell us how you located each piece of data &#8212; did The Google serve you well, or were you forced to turn elsewhere?</p>
<p>If you really want to stretch your brain, try to write a tool to import each chunk of data into your favorite relational database. There will be a related question in a couple of weeks involving modeling one of these pieces of data, so you overachievers can start thinking about it now&#8230;</p>
<p>Good luck, and check back this weekend for the answers.</p>
<script src="http://ads.aws.sitepoint.com/adjs.php?region=136&amp;did=adz&amp;adtype=vertical" type="text/javascript"></script>]]></content:encoded>
			<wfw:commentRss>http://www.sitepoint.com/blogs/2006/11/14/scavenger-hunt/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Welcome to the quiz!</title>
		<link>http://www.sitepoint.com/blogs/2006/11/14/welcome-2/</link>
		<comments>http://www.sitepoint.com/blogs/2006/11/14/welcome-2/#comments</comments>
		<pubDate>Mon, 13 Nov 2006 22:45:14 +0000</pubDate>
		<dc:creator>jacob-kaplan-moss</dc:creator>
		
		<category><![CDATA[Web Developer Quiz]]></category>

		<guid isPermaLink="false">http://www.sitepoint.com/blogs/?p=1776</guid>
		<description><![CDATA[Sharpen your pencils, dust off your keyboards, and put on your thinking caps -- the Web Developer's Quiz is coming.]]></description>
			<content:encoded><![CDATA[<p>Sharpen your pencils, dust off your keyboards, and put on your thinking caps &#8212; the Web Developer&#8217;s Quiz is coming.</p>
<h4>So what&#8217;s this all about?</h4>
<p>There&#8217;s no question that being a web developer is incredibly complicated.  Most of us know the in&#8217;s and out&#8217;s of enough acronyms to choke a government agency: HTML, CSS, AJAX, SQL, XML, JSON, HTTP&#8230;  Being a jack-of-all-trades is in the job description.</p>
<p>Piecing all these disparate technologies together often feels like doing a complicated jigsaw puzzle. For many of us, the best part of being a web geek is that &#8220;aha!&#8221; moment when the puzzle is finally solved.</p>
<div id="adz" class="vertical"></div><p>But the brain&#8217;s a muscle: stop exercising it and it&#8217;ll atrophy. This quiz is designed to keep your web development brain strong; each week, I&#8217;ll post a question that only a web developer should be able to solve. </p>
<p>Since we all have such a disparate set of skills, I&#8217;ll aim to make these questions accessible to any web developer by focusing on the tools and technologies that we all know. I&#8217;ll usually post questions on Tuesday, and then post a follow-up with the answer over the weekend.</p>
<h4>Meet your quizmaster</h4>
<p>My name is Jacob Kaplan-Moss; I&#8217;m the lead developer at the Lawrence Journal-World, a family-owned newspaper in Lawrence, KS.  If my name looks familiar, it&#8217;s probably because I&#8217;m also one of the lead developers of Django, an open-source web framework (which was just <a href="http://www.sitepoint.com/article/build-to-do-list-30-minutes">featured in a SitePoint tutorial</a> written by a coworker of mine).</p>
<p>This is my first gig hosting a quiz show (yeah, a virtual one, but still&#8230;), but that doesn&#8217;t mean I&#8217;ll go easy; expect these questions to be as hard as I can make &#8216;em.</p>
<h4>Back after these messages&#8230;</h4>
<p>So tune in tomorrow for the first question &#8212; a bit of a warm up to set the tone for questions to come.</p>
<p>See you there!</p>
<script src="http://ads.aws.sitepoint.com/adjs.php?region=136&amp;did=adz&amp;adtype=vertical" type="text/javascript"></script>]]></content:encoded>
			<wfw:commentRss>http://www.sitepoint.com/blogs/2006/11/14/welcome-2/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>
