Blog Post RSS ?

Blogs » Web Developer Quiz » Episode 2: Real-world regular expressions
 

Episode 2: Real-world regular expressions

by Jacob Kaplan-Moss

Let’s get this out there right off the bat: I love regular expressions. Really, I do — they’re the Swiss Army Knife of text processing, and no respecting developer can go long without needing ‘em.

Of course, we all also know how dangerous they can be. As always, with great power comes great responsibility.

Still, if you know how — and when — and why — to use regular expressions, they’re indispensable. So this week, regular expressions will be our theme.

Below are five regular expressions. Each one of them matches a real-world string; that is, a semi-structured piece of text you might want to pull out of a greater document. Here’s an example question to give you an idea what I mean:

  1. [0-9]{5}

This, of course, is a US ZIP code.

So, what “things” do these regular expressions match? We’ll assume for this quiz that the regex engine is running in case-insensitive mode:

  1. [A-PR-Y0-9]{3}-[A-PR-Y0-9]{3}-[A-PR-Y0-9]{4}
  2. &(?!(\w+|#\d+);)
  3. (-?(?:0|[1-9]\d*))(\.\d+)?([eE][-+]?\d+)?
  4. ([\da-f]{2}:){5}([\da-f]{2})
  5. <[^>]*?>

Of course, since we’re dealing with regular expressions here, I’d be amiss if I didn’t give you two problems for the price of one.

In each case, the regular expression has something wrong with it. For example, the ZIP code regex above doesn’t correctly match the ZIP+4 format (i.e. 66044-0034) that’s used for many addresses these days.

So, for part two, what’s wrong with the rest of ‘em?

Enjoy your Thanksgiving belly-stuffing, and tune in over the weekend for the answers.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • Ping.fm
  • Twitthis

This post has 10 responses so far

Sponsored Links

SitePoint Marketplace

Buy and sell Websites, templates, domain names, hosting, graphics and more.

Follow SitePoint on...