Regex works in text editor, but not in PHP

I’m trying to run a regex to find all HTML <h?> and matching </h?> tags and the header text in between. The regex


works in my text editor (UltraEdit) (set for PERL syntax), but when I try to run it in PHP:

$str = "<html><head>	</head><body>
	<h1>Test File Main Heading</h1>
	<h2>Heading 1</h2>
	<p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Vestibulum tristique. Curabitur quis metus ac purus fermentum sollicitudin.</p>
$re = "<h([1-6])>([^<]*)</h([1-6])>";
preg_match($re, $str, $matches);

I am getting the following warning:

Warning: preg_match() [function.preg-match]: Unknown modifier '(' in D:\\websites\\hmicom\	est\\createTextFile.php on line 31

Line 31 is the ‘preg_match()’ line.

I’ve tried several variants, but without success so far. The round brackets ‘(’ I’ve used are paired, and normally used without problem for creating back-references. I can’t see why one should be treated as an ‘unknown modifier’ by PHP.
Any suggestions, please ?

You need a start and end delimiter and escape a couple of the > and \

$re = "~<h([1-6])\\>([^<]*)<\\/h([1-6])\\>~";

Thank you, Spike7. I had forgotten about the need for delimiters. The expression certainly works to the point of dealing with the warning message, so now I can develop it further.

Can you explain why you suggest the ‘>’ needs escaping, please ? It seems to work OK without, as a literal.

Later: Ah, I think I see why, it’s because ‘<>’ are legal options as delimeters, and the ‘/’ makes sure they are not treated as such ?

Yep :slight_smile:

Thanks for your help Mike, that’s undone the log jam for me (and probably saved me from buying a book or application that I only need twice a year).

None of the escaped characters in that regex need escaping. It’s usually a “just to be safe” practice when you’re not really sure. (:

The following would work identically:

$re = "~<h([1-6])>([^<]*)</h([1-6])>~"

Thanks, Salathe, I thought that was the case.

FWIW: Using the ‘-’ delimiters gives me a PHP warning “Unknown modifier ‘6’ in …”, whereas ‘/’ works fine. This with or without escapes. I’ve not seen ‘-’ used for delimiters before today (but then I don’t use Regexes all that often).

The character that spikeZ and I used was the tilde (~), not a hyphen (-). If you used the latter, a regex like [COLOR="DarkOrange"]-[/COLOR][COLOR="Green"]<h([1[/COLOR][COLOR="DarkOrange"]-[/COLOR][COLOR="Sienna"]6])>([^<]*)</h([1-6])>-[/COLOR] would see - as the starting delimiter, then [COLOR="Green"]<h([1[/COLOR] as the regex, followed by - as the closing delimiter. Since the closing delimiter has (mistakenly) been hit, the rest of the pattern must be modifiers but 6 is not a valid modifier so the warning is issued.

Yes, I hadn’t figured out the precise mechanism, but can see now that is exactly it. I wondered if it was my eyes or my screen that I can’t tell a tilde as an hyphen. Now I know it’s a tilde I can just about see that it is. The difference is clear in the narrative (~-) but in the ‘Code’ it’s very hard to tell. (My screen is a 19" CRT that refuses to die).
Thanks for your help.