<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Episode 2: Real-world regular expressions</title>
	<atom:link href="http://www.sitepoint.com/blogs/2006/11/22/episode-2-real-world-regular-expressions/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.sitepoint.com/blogs/2006/11/22/episode-2-real-world-regular-expressions/</link>
	<description>News, opinion, and fresh thinking for web developers and designers. The official podcast of sitepoint.com.</description>
	<lastBuildDate>Mon, 23 Nov 2009 01:39:24 -0500</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Anonymous</title>
		<link>http://www.sitepoint.com/blogs/2006/11/22/episode-2-real-world-regular-expressions/comment-page-1/#comment-918914</link>
		<dc:creator>Anonymous</dc:creator>
		<pubDate>Fri, 17 Apr 2009 08:17:27 +0000</pubDate>
		<guid isPermaLink="false">http://www.sitepoint.com/blogs/2006/11/22/episode-2-real-world-regular-expressions/#comment-918914</guid>
		<description>Nice collection!
Here is a good example which late you how to implement Regular Expressions with .net for U.S. Social Security Numbers.

&lt;code&gt;^((?!000)([0-6]\d{2}&#124;[0-7]{2}[0-2]))-((?!00)\d{2})-((?!0000)\d{4})$&lt;/code&gt;


Also get complete code for ASP.NET, VB.NET and C#.NET

Please check:- http://www.tipsntracks.com/98/regular-expressions-with-net-us-social-security-numbers.html</description>
		<content:encoded><![CDATA[<p>Nice collection!<br />
Here is a good example which late you how to implement Regular Expressions with .net for U.S. Social Security Numbers.</p>
<code>^((?!000)([0-6]\d{2}|[0-7]{2}[0-2]))-((?!00)\d{2})-((?!0000)\d{4})$</code>
<p>Also get complete code for ASP.NET, VB.NET and C#.NET</p>
<p>Please check:- <a href="http://www.tipsntracks.com/98/regular-expressions-with-net-us-social-security-numbers.html" rel="nofollow">http://www.tipsntracks.com/98/regular-expressions-with-net-us-social-security-numbers.html</a></p>]]></content:encoded>
	</item>
	<item>
		<title>By: mmj</title>
		<link>http://www.sitepoint.com/blogs/2006/11/22/episode-2-real-world-regular-expressions/comment-page-1/#comment-103229</link>
		<dc:creator>mmj</dc:creator>
		<pubDate>Wed, 22 Nov 2006 23:11:11 +0000</pubDate>
		<guid isPermaLink="false">http://www.sitepoint.com/blogs/2006/11/22/episode-2-real-world-regular-expressions/#comment-103229</guid>
		<description>Number 2 searches for any occurence of an ampersand (&amp;) that does NOT appear to be the beginning of a named or numeric entity.

It may be useful if you need to find &lt;em&gt;what appear to be&lt;/em&gt; unescaped ampersands in a string.

&quot;This &amp; that&quot; would match
&quot;This &amp; that&quot; would not match

It doesn&#039;t take into account the validity of the entity reference, and doesn&#039;t account for numeric character entities in hexadecimal form.</description>
		<content:encoded><![CDATA[<p>Number 2 searches for any occurence of an ampersand (&amp;) that does NOT appear to be the beginning of a named or numeric entity.</p>
<p>It may be useful if you need to find <em>what appear to be</em> unescaped ampersands in a string.</p>
<p>&#8220;This &amp; that&#8221; would match<br />
&#8220;This &amp;amp; that&#8221; would not match</p>
<p>It doesn&#8217;t take into account the validity of the entity reference, and doesn&#8217;t account for numeric character entities in hexadecimal form.</p>]]></content:encoded>
	</item>
	<item>
		<title>By: Stormrider</title>
		<link>http://www.sitepoint.com/blogs/2006/11/22/episode-2-real-world-regular-expressions/comment-page-1/#comment-102707</link>
		<dc:creator>Stormrider</dc:creator>
		<pubDate>Wed, 22 Nov 2006 12:47:30 +0000</pubDate>
		<guid isPermaLink="false">http://www.sitepoint.com/blogs/2006/11/22/episode-2-real-world-regular-expressions/#comment-102707</guid>
		<description>bah. Another US specific one :(</description>
		<content:encoded><![CDATA[<p>bah. Another US specific one :(</p>]]></content:encoded>
	</item>
	<item>
		<title>By: larryp</title>
		<link>http://www.sitepoint.com/blogs/2006/11/22/episode-2-real-world-regular-expressions/comment-page-1/#comment-102164</link>
		<dc:creator>larryp</dc:creator>
		<pubDate>Wed, 22 Nov 2006 01:54:15 +0000</pubDate>
		<guid isPermaLink="false">http://www.sitepoint.com/blogs/2006/11/22/episode-2-real-world-regular-expressions/#comment-102164</guid>
		<description>Hi birnam,

I agree with your disagreement with me on item 4. :) I  jumped the gun on the number of groups. Good job on the explanations/corrections, too.</description>
		<content:encoded><![CDATA[<p>Hi birnam,</p>
<p>I agree with your disagreement with me on item 4. :) I  jumped the gun on the number of groups. Good job on the explanations/corrections, too.</p>]]></content:encoded>
	</item>
	<item>
		<title>By: birnam</title>
		<link>http://www.sitepoint.com/blogs/2006/11/22/episode-2-real-world-regular-expressions/comment-page-1/#comment-101984</link>
		<dc:creator>birnam</dc:creator>
		<pubDate>Tue, 21 Nov 2006 22:44:58 +0000</pubDate>
		<guid isPermaLink="false">http://www.sitepoint.com/blogs/2006/11/22/episode-2-real-world-regular-expressions/#comment-101984</guid>
		<description>I agree with Larry, except I think #4 is a MAC address, not an IP address (like mmanders suggested)

As for the &quot;errors&quot;:

1. US phone number -- doesn&#039;t account for a preceding 1, if the area code is in parenthesis, if the digit groups are separated by a dot or space instead of a dash, or the fact that cell phones have Q and Z on them. It also doesn&#039;t make sure the group is isolated, and not part of something like 1234888-234-123456123.

&lt;pre&gt;&lt;code&gt; \b((1[-. ])?\(?[a-z0-9]{3}\)?[-. ][a-z0-9]{3}[-.][a-z0-9]{4})\b&lt;/code&gt;&lt;/pre&gt;

2. HTML character and entity references -- should be a positive look-ahead, not a negative one, and instead of a word character it should be a-z since entity references don&#039;t have &#039;_&#039;

&lt;pre&gt;&lt;code class=&#039;html&#039;&gt;&amp;(?=([a-z]+&#124;\#\d+);)&lt;/code&gt;&lt;/pre&gt;

3. numbers in exponential notation (with &#039;1.2354 e10&#039; style exponent) -- I believe that exponential notation only has one digit before the decimal, so the \d* should be dropped and a negative lookbehind added to ensure a single digit. Also, there could be a space between the digits and the exponent.

&lt;pre&gt;&lt;code class=&#039;html&#039;&gt;\b(?&lt;!\d)(-?[0-9])(\.\d+)?\s?([eE][-+ ]?\d+)\b&lt;/code&gt;&lt;/pre&gt;

4. MAC address -- doesn&#039;t allow for digit groups to be delineated by hyphens. Because : counts as a non-word character it&#039;s not as easy as putting a word boundary on either side.

&lt;pre&gt;&lt;code class=&#039;html&#039;&gt;(?&lt;![-0-9a-f:])([\da-f]{2}[-:]){5}([\da-f]{2})(?![-0-9a-f:])&lt;/code&gt;&lt;/pre&gt;

There&#039;s also a type of MAC address format like 0123.4567.89ab, so you could more precisely do this:

&lt;pre&gt;&lt;code class=&#039;html&#039;&gt;(?&lt;![-0-9a-f:])(([\da-f]{2}[-:]){5}([\da-f]{2})&#124;([\da-f]{4}\.){2}([\da-f]{4}))(?!\.?[-0-9a-f:])&lt;/code&gt;&lt;/pre&gt;

5. XML style markup tags -- shouldn&#039;t have an asterisk for the contents, since that could also match an empty &lt;&gt;. And having the non-greedy * followed by a &gt;, and a match for any character but &gt; were accomplishing the same thing so I dropped one.

&lt;pre&gt;&lt;code class=&#039;html&#039;&gt;(&lt;.+?&gt;)&lt;/code&gt;&lt;/pre&gt;

Note: I&#039;m assuming these will be processed as case-insensitive, or else there&#039;s a whole new set of problems...  There are a million different ways to do regex, so these are just my suggestions -- I&#039;m sure there are better ways.

This was fun!</description>
		<content:encoded><![CDATA[<p>I agree with Larry, except I think #4 is a MAC address, not an IP address (like mmanders suggested)</p>
<p>As for the &#8220;errors&#8221;:</p>
<p>1. US phone number &#8212; doesn&#8217;t account for a preceding 1, if the area code is in parenthesis, if the digit groups are separated by a dot or space instead of a dash, or the fact that cell phones have Q and Z on them. It also doesn&#8217;t make sure the group is isolated, and not part of something like 1234888-234-123456123.</p>
<pre><code> \b((1[-. ])?\(?[a-z0-9]{3}\)?[-. ][a-z0-9]{3}[-.][a-z0-9]{4})\b</code></pre>
<p>2. HTML character and entity references &#8212; should be a positive look-ahead, not a negative one, and instead of a word character it should be a-z since entity references don&#8217;t have &#8216;_&#8217;</p>
<pre><code class='html'>&#038;(?=([a-z]+|\#\d+);)</code></pre>
<p>3. numbers in exponential notation (with &#8216;1.2354 e10&#8242; style exponent) &#8212; I believe that exponential notation only has one digit before the decimal, so the \d* should be dropped and a negative lookbehind added to ensure a single digit. Also, there could be a space between the digits and the exponent.</p>
<pre><code class='html'>\b(?&lt;!\d)(-?[0-9])(\.\d+)?\s?([eE][-+ ]?\d+)\b</code></pre>
<p>4. MAC address &#8212; doesn&#8217;t allow for digit groups to be delineated by hyphens. Because : counts as a non-word character it&#8217;s not as easy as putting a word boundary on either side.</p>
<pre><code class='html'>(?&lt;![-0-9a-f:])([\da-f]{2}[-:]){5}([\da-f]{2})(?![-0-9a-f:])</code></pre>
<p>There&#8217;s also a type of MAC address format like 0123.4567.89ab, so you could more precisely do this:</p>
<pre><code class='html'>(?&lt;![-0-9a-f:])(([\da-f]{2}[-:]){5}([\da-f]{2})|([\da-f]{4}\.){2}([\da-f]{4}))(?!\.?[-0-9a-f:])</code></pre>
<p>5. XML style markup tags &#8212; shouldn&#8217;t have an asterisk for the contents, since that could also match an empty &lt;&gt;. And having the non-greedy * followed by a &gt;, and a match for any character but &gt; were accomplishing the same thing so I dropped one.</p>
<pre><code class='html'>(&lt;.+?&gt;)</code></pre>
<p>Note: I&#8217;m assuming these will be processed as case-insensitive, or else there&#8217;s a whole new set of problems&#8230;  There are a million different ways to do regex, so these are just my suggestions &#8212; I&#8217;m sure there are better ways.</p>
<p>This was fun!</p>]]></content:encoded>
	</item>
	<item>
		<title>By: dev_cw</title>
		<link>http://www.sitepoint.com/blogs/2006/11/22/episode-2-real-world-regular-expressions/comment-page-1/#comment-101892</link>
		<dc:creator>dev_cw</dc:creator>
		<pubDate>Tue, 21 Nov 2006 20:47:50 +0000</pubDate>
		<guid isPermaLink="false">http://www.sitepoint.com/blogs/2006/11/22/episode-2-real-world-regular-expressions/#comment-101892</guid>
		<description>This is a fun way to learn a bit more about regex.</description>
		<content:encoded><![CDATA[<p>This is a fun way to learn a bit more about regex.</p>]]></content:encoded>
	</item>
	<item>
		<title>By: Larry</title>
		<link>http://www.sitepoint.com/blogs/2006/11/22/episode-2-real-world-regular-expressions/comment-page-1/#comment-101795</link>
		<dc:creator>Larry</dc:creator>
		<pubDate>Tue, 21 Nov 2006 19:17:47 +0000</pubDate>
		<guid isPermaLink="false">http://www.sitepoint.com/blogs/2006/11/22/episode-2-real-world-regular-expressions/#comment-101795</guid>
		<description>1.  Telephone numbers, including those represented as letters.
2.  HTML character entity encodings in hexadecimal form.
3.  Numbers in scientific/engineering notation.
4.  IPv6 IP addresses
5.  HTML/XML/XHTML Tags. Any markup, essentially, where the tags use open/closing angle brackets.</description>
		<content:encoded><![CDATA[<p>1.  Telephone numbers, including those represented as letters.<br />
2.  HTML character entity encodings in hexadecimal form.<br />
3.  Numbers in scientific/engineering notation.<br />
4.  IPv6 IP addresses<br />
5.  HTML/XML/XHTML Tags. Any markup, essentially, where the tags use open/closing angle brackets.</p>]]></content:encoded>
	</item>
	<item>
		<title>By: dix</title>
		<link>http://www.sitepoint.com/blogs/2006/11/22/episode-2-real-world-regular-expressions/comment-page-1/#comment-101789</link>
		<dc:creator>dix</dc:creator>
		<pubDate>Tue, 21 Nov 2006 19:13:59 +0000</pubDate>
		<guid isPermaLink="false">http://www.sitepoint.com/blogs/2006/11/22/episode-2-real-world-regular-expressions/#comment-101789</guid>
		<description>Number 2 appears to be matching html character codes (e.g. &nbsp; or &#169;).  The ! is negative lookahead assertion which should be removed to have it work correctly.</description>
		<content:encoded><![CDATA[<p>Number 2 appears to be matching html character codes (e.g. &amp;nbsp; or &#169;).  The ! is negative lookahead assertion which should be removed to have it work correctly.</p>]]></content:encoded>
	</item>
	<item>
		<title>By: mmanders</title>
		<link>http://www.sitepoint.com/blogs/2006/11/22/episode-2-real-world-regular-expressions/comment-page-1/#comment-101697</link>
		<dc:creator>mmanders</dc:creator>
		<pubDate>Tue, 21 Nov 2006 17:45:15 +0000</pubDate>
		<guid isPermaLink="false">http://www.sitepoint.com/blogs/2006/11/22/episode-2-real-world-regular-expressions/#comment-101697</guid>
		<description>Edit... 5. should read &quot;... anything starting with a &#039;\</description>
		<content:encoded><![CDATA[<p>Edit&#8230; 5. should read &#8220;&#8230; anything starting with a &#8216;\</p>]]></content:encoded>
	</item>
	<item>
		<title>By: mmanders</title>
		<link>http://www.sitepoint.com/blogs/2006/11/22/episode-2-real-world-regular-expressions/comment-page-1/#comment-101696</link>
		<dc:creator>mmanders</dc:creator>
		<pubDate>Tue, 21 Nov 2006 17:42:45 +0000</pubDate>
		<guid isPermaLink="false">http://www.sitepoint.com/blogs/2006/11/22/episode-2-real-world-regular-expressions/#comment-101696</guid>
		<description>1. I&#039;m not from the states so am unsure, but is it a social security number?  If so, then the second sequence should only contain a repitition of 2.
&lt;strong&gt;[A-PR-Y0-9]{3}-[A-PR-Y0-9]{2}-[A-PR-Y0-9]{4}&lt;/strong&gt;

2. Not a clue about this one! An optional ampersand, followed by an exclamation mark, followed by either one or more words (alphanumerics) or a hash followed by one or more digits, terminated with a semi-colon.

3. I think this is a number expressed in scientific notation, e.g. 1.3e10 - However, I don&#039;t think the &lt;strong&gt;\d&lt;/strong&gt; is necessary in &lt;strong&gt;&quot;[1-9]\d*&quot;&lt;/strong&gt;

4. This looks like a MAC address.  Six hex numbers separated by colons.  However, I can&#039;t see anything wrong with it so I&#039;m probably wrong!

5. This looks like it would match an SGML tag of some sort, although it&#039;s not very specific.  It will match anything starting with a &#039;&#039; followed by a closing &#039;&gt;&#039;.</description>
		<content:encoded><![CDATA[<p>1. I&#8217;m not from the states so am unsure, but is it a social security number?  If so, then the second sequence should only contain a repitition of 2.<br />
<strong>[A-PR-Y0-9]{3}-[A-PR-Y0-9]{2}-[A-PR-Y0-9]{4}</strong></p>
<p>2. Not a clue about this one! An optional ampersand, followed by an exclamation mark, followed by either one or more words (alphanumerics) or a hash followed by one or more digits, terminated with a semi-colon.</p>
<p>3. I think this is a number expressed in scientific notation, e.g. 1.3e10 &#8211; However, I don&#8217;t think the <strong>\d</strong> is necessary in <strong>&#8220;[1-9]\d*&#8221;</strong></p>
<p>4. This looks like a MAC address.  Six hex numbers separated by colons.  However, I can&#8217;t see anything wrong with it so I&#8217;m probably wrong!</p>
<p>5. This looks like it would match an SGML tag of some sort, although it&#8217;s not very specific.  It will match anything starting with a &#8221; followed by a closing &#8216;&gt;&#8217;.</p>]]></content:encoded>
	</item>
</channel>
</rss>
