Double backslash in regex?

I’m reading “Build your own database drive website using php & mysql” by kevin yank and are having trouble understanding the use of double backslashes in regular expressions like in this book segment:

$joketext = eregi_replace('\\\\[b]','<strong>',$joketext);
$joketext = eregi_replace('\\\\[eb]','</strong>',$joketext);

“Notice that, because [ normally indicates the start of a set of acceptable characters in a regular expression, we put a backslash before it in order to remove its special meaning. As backslashes are also used to escape special characters in PHP strings, we must use a double backslash for each single backslash that we with to have in the regular expression.”

I understand that the second backslash is used to escape the first square bracket. Otherwise the regex would search for the “b” character. It says that the first backslash is used because backslashes are used to escape special characters in PHP, but isn’t that what the second backslash just did?

So why is this wrong?

$joketext = eregi_replace('\\[b]','<strong>',$joketext);

If I were PHP I would read it like: there’s an escape backslash so take the first square bracket literally, so search for “[b]”

If I were PHP I would read this:

$joketext = eregi_replace('\\\\[b]','<strong>',$joketext);

like: there’s an escape character so take the second one literally, after that search for a b character (“[b]”). So search for “\b”

Can someone explain the logic behind this and where I am thinking wrong?

For starters, I’m astounded he’s using ereg. It sucks. preg_ functions are far quicker.

The first backslash is to escape the backslash in PHP itself. However it’s not needed in single-quoted string literals. The only thing you need to escape in single-quoted literals is a single-quote itself.

I haven’t read his book, and now seeing some examples of his shoddy coding, I never intend to.

In the 2nd print it used to say:

$joketext = eregi_replace(‘\[b]’,‘<strong>’,$joketext);

and no mention of the line that a second backslash ought to be used because special chars in php etc.

In the 3rd print he changed it, so I thought that was for a reason. Because it was not correct … I understand how backslashed work a bit, but this use puzzled me… So is it wrong?

Kevin Yank is an idiot with inconsistent coding practices.


echo '\\\\[b]';
// output: \\[b]

echo '\\[b]';
// output: \\[b]

It’s exactly the same output.


$joketext = eregi_replace('\\[b]','<strong>',$joketext);

Actually, this is by no means wrong and works just fine. The reason why you sometimes need to double slashes is that they are stripped twice – once by php (at compile time) and once by regexp engine.

Let’s say I want to check the following sentence for the presence of “[b]”

“I am trying to understand more about [b]backslashes[eb] and regular expressions”

When I use:

$joketext = eregi_replace('\\[b]','<strong>',$joketext);

I understand that it works as it should. The first square bracket loses it’s special meaning because of the one escape backslash. As the first square bracket doesn’t have a special meaning anymore, as a result the closing bracket loses it’s special meaning. You could say:

$joketext = eregi_replace('\\[b\\]','<strong>',$joketext);

But’s the 2nd one isn’t necessary because of that.

So how does:

$joketext = eregi_replace('\\\\[b]','<strong>',$joketext);

work the same or differently?

and why does

echo '\\[b]';

output

\\[b]

:confused:
and not

[b]

Doesn’t php read it like "a backslash so ignore the “[” special meaning (and the backslash itself for that matter), making it literally “[” followed by “b]” "

Because of the single, not double, quotation marks.

In a PHP string a [ doesn’t have sort of special meaning anyway. None. zero. Nada.

Try this


echo "\\[";

[ doesn’t have a special meaning? :confused:

I still have problems understanding (amateur me), I’m sorry :goof: but if I wanted to check for the characters a, b, and c in a sentence I would use the regex “[abc]” matching a sentence like “this is A BAd exAmple of ChArACters” (the matching characters kapitalized). But if I wanted to match literally the string “[abc]” I should escape the “[” like “\[abc]” matching a sentence like “in this example the code [abc] is used”.

But according the Kevin I should use this? “\\[abc]”

I’ve also tried this code:

<?php
$string = "this is a [b] test";
$test1 = ereg ('\\\\[b]', $string);
$test2 = ereg ('\\[b]', $string);
echo $test1."<br>".$test2;
?>

Both $test1 and @test2 equal true… why? :confused:

Perhaps I could understand it better if you were to read it for me like php would read this regex code?

Are you trying to prove something here? If so I’m missing it. It outputs \[… as expected…


digitalecartoons

You are confusing PHP string literals and regular expressions.

digitalecartoons


$test1 = ereg ('\\\\[b]', $string);

  1. at compile time, php reads ‘\\[b]’, converts it to four characters \ [ b ] (because \\ denotes one single character – a backslash)
  2. at run time this string (of four chars) is passed to regex module. regex reads \ then [ and realises that [ has in this case no special meaning and should be matched literally.

$test1 = ereg ('\\[b]', $string);

  1. at compile time, php parses ‘\[b]’ and stores… surpise… the same four characters as before: \ [ b ]. Why? Because php strips backslashes only if the next character has a special meaning in php strings. In single quotes, these are only ’ (single quote) and \. That is

'\\\\hello' translates to \\hello (backslash removed)
'\\'hey' translates to  'hey (backslash removed)
but
'\\foo' --> \\foo (backslash NOT removed)
'\\[b] --> \\[b] (backslash NOT removed)

So I already read about single and double quote strings on php.net

Single Quotes. To specify a literal single quote, you will need to escape it with a backslash (\), like in many other languages. If a backslash needs to occur before a single quote or at the end of the string, you need to double it. Note that if you try to escape any other character, the backslash will also be printed! So usually there is no need to escape the backslash itself.

And that things like this output the same (just like you said):

// Outputs: You deleted C:\\*.*?
echo 'You deleted C:\\\\*.*?';
// Outputs: You deleted C:\\*.*?
echo 'You deleted C:\\*.*?';

So now I understand that ‘\\[b]’ and ‘\\[b]’ both output ‘\[b]’
(please correct me if I’m wrong).

So that’s part one. Part two, what’s left in both cases is ‘\[b]’ which is read by PHP’s ereg function as “[b]”

Is this how it works and am I beginning to understand it? :lol:

So basically

$test1 = ereg (‘\\[b]’, $string);
is the same as
$test1 = ereg (‘\[b]’, $string);
because the first is converted to the last anyway? Having the same result before being processed by ereg?

If so, why’s Kevin making such a point out of it that I should use double backslashes since both output the same ‘\[b]’ before going through ereg?

Because Kevin is an idiot :slight_smile:

I think that’s what got my confused. That Kevin stated so explicitely that I should use double backslashes in this case. Making me think the code would not be correct if I were to use only one. Making me feel like an idiot for not understanding it :slight_smile:
I just thought, maybe there’s some hidden purpose for specifically choosing for double slashes in this perticular case. One that only an expert php-er knows about :slight_smile:

But thanks for making it clear to me now and for being so patient with me haha

I didn’t read Kevin’s book, but he is absolutely right here. You should always use double backslashes for the sake of code clarity and portability.