Content Formatting with Regular Expressions - Matching Tags

Book: Build Your Own Database Driven Web Site Using PHP & MySQL
Author: Kevin Yank
Edition: 4th

pg 255, last two paragraphs, “One weakness…nested tags…will fail to work correctly with this code…+ and * are…greedy…”

Problem: “This code” seems to refer to the block of code listed in the middle of page 255, but running this code shows that it works without a problem. On the following page (256), a different reg. exp. example is given and in this context the + symbol is, in fact, greedy. So, is the wording misleading, am I missing something, or is there a difference between ‘/([[1]+)\[/’ and ‘/(.+)\[/’?

pg 256, last code block

Problem 1: Given the subject matter of Matching Tags, why doesn’t the [U_R_L]link[/U_R_L] code illustrate the non-greedy treatment? i.e. …([…]+?)… instead of …([…]+)…

Problem 2: Continuing in the same code block at the top of page 257, the use of the non-greedy (.+?) in the [U_R_L=url]link[/U_R_L] code makes no difference, that I can tell, when compared to using the greedy (.+). The HTML results are the same and the HTML is correct for both. Shouldn’t I see a difference in the results?

Thanks,
Steve

P.S. I had to use underscores when listing the URL tags.

Windows XP 32-bit running…
Apache 2.2.14
PHP 5.3.1
MySQL 5.1.41


  1. ↩︎

If you try the code on p.255 with a document containing nested tags, you’ll see the difference. Try applying it to this content (replacing {braces} with [square brackets]):

This text contains {B}{I}bold, italic text{/I}{/B}

The code on p.255 can’t handle the bold tags in this example, because their content contains square brackets (the italic tags).

The code on p.256, however, handles this example just fine.

pg 256, last code block

Problem 1: Given the subject matter of Matching Tags, why doesn’t the [U_R_L]link[/U_R_L] code illustrate the non-greedy treatment? i.e. …([…]+?)… instead of …([…]+)…

Because this form of the URL tag requires the tags to contain a valid URL and nothing else. Since a valid URL will not contain square brackets, we needn’t go out of our way to handle square brackets within the content of the tag in this case.

Problem 2: Continuing in the same code block at the top of page 257, the use of the non-greedy (.+?) in the [U_R_L=url]link[/U_R_L] code makes no difference, that I can tell, when compared to using the greedy (.+). The HTML results are the same and the HTML is correct for both. Shouldn’t I see a difference in the results?

Again, try an example with nested tags to see the difference:

This {URL=http://example.com}link contains {B}bold text{/B}{/URL}

  1. ↩︎