Preg_match issues- expression works on some servers but not others

bruin03 · April 16, 2010, 8:13pm

I have the following string:


$txt = "Company:       My Company
Contact Name:  John Smith";

Then I use the following functions to match the data I want until a new line/carriage return.

preg_match('/Company(.*?)\
/',$txt,$company);
	print '<td>'.prep($company[1]).'</td><br>';
preg_match('/Contact Name(.*?)\
/',$txt,$contact);
	print '<td>'.prep($contact[1]).'</td><br>';

Outputs:
My Company
John Smith

But when I change the input $txt to a database row value, The above works on one server, but it I call the same database (different table) on another server I get the following output:

,
John Smith

What might be the reason for this?

bruin03 · April 16, 2010, 10:48pm

Oh, I am not outputting to an html document. I am building a parser which wiill put the values back into another table.

I will inspect it again. thx.

crmalibu · April 16, 2010, 8:41pm

The value of $txt probably varies. Some characters are not possible to see visually, so you might want to look at the length of the string, or even do a hex dump in order to be able to find the difference.

It could also be your prep() function.

bruin03 · April 16, 2010, 9:22pm

this is what prep does.

function prep($string){
	$string= str_replace(':','',$string);
	$string = trim($string);
	$string = mysql_escape_string($string);
	return $string;
	}

Could this be changing something?

Why would this work on one server but not another?

crmalibu · April 16, 2010, 9:37pm

Seems like a very inappropriate function to use when you want to output something into an html document. But, it wouldn’t cause your problem.

The problem is very likely to be that the value of $txt is different. You should closely inspect it.

dyer85 · April 17, 2010, 5:18am

You should probably manually specify the newline. This string will contain different EOL sequences, depending on the platform on which the code runs.

$txt = "Company:       My Company\
"
  . 'Contact Name:  John Smith';

Then I use the following functions to match the data I want until a new line/carriage return.

preg_match('/Company(.*?)\
/',$txt,$company);
	print '<td>'.prep($company[1]).'</td><br>';

You boldly assume [preg_match](http://php.net/preg_match)() will succeed before probing the matches array. As noted above, variations in EOL terminators may be the issue here. Also, your regex could be improved, since you don’t seem to want the whitespace:

if (preg_match('/Company \\W* (\\w [^\\r\
]*)/x', $txt, $company)) {
  print_r($company);
}

The [\\W*](http://php.net/manual/en/regexp.reference.backslash.php) predefined character class looks for non-word characters zero or more times (so, it would include the colon and whitespace after “Company”). The \\w character class is the opposite: it requires word characters (i.e., characters part of a Perl “word”) and is locale-sensitive. I explicitly test for this once to act as a boundary between non-word and word characters. From there, [^\\r\ ]* slurps everything that isn’t a newline character. I didn’t match the ending EOL, since you don’t seem to need it. With regexes, you should match only what you really need.

Lastly, the [/x](http://php.net/manual/en/reference.pcre.pattern.modifiers.php) modifier allows ignoring whitespace and easily inserting comments in your patterns. If you want to explicitly match whitespace characters, you’d have to escape them (including blanks, or ASCII 32). With the free-form syntax, it’s generally best to stick with single-quoted strings*. Using the more free-form syntax in regexes helps keep them readable; it’s a good idea to use /x often.

A similar approach could be taken for your other regex. You might also consider putting your patterns in an array and looping through it, especially if you anticipate adding more.

Also, nitpicking, but the arbitrary indent for the print statement here is odd, considering it’s not enclosed within a block.

Compare the following regexes:

<?php

$data = 'foo         bar';

/* \
 is interpreted as an actual newline by the time
   libpcre compiles it. So, with the /x modifier in use,
   "\
" is just ignored. */
$re = "/foo \
 \\W* bar/x";

/* '\
' in PHP single-quoted strings is NOT an escape
   sequence, so libpcre will get '\
' and use it as an
   escape sequence. */
$re2 = '/foo \
 \\W* bar/x';

var_dump(preg_match($re, $data));
var_dump(preg_match($re2, $data));

bruin03 · April 18, 2010, 5:16am

Hi dyder85, thanks for the great tips! I will definitely use them going forward. I’ll be the first to admit that my knowledge of regular expressions is limited.

Just a point of clarification, I am only using $txt as an example.
I am getting the value of $txt from another source (database column), and I have no control over its contents.

The contents are a value of a database column that shows:
Company: My Company
Contact Name: John Smith

Topic		Replies	Views
Regex issue in PHP: DB vs String PHP	5	1126	October 3, 2015
preg_match question PHP	4	435	August 27, 2011
Having trouble with Matching String PHP	5	520	January 8, 2010
Regex Match PHP	11	510	March 11, 2010
Preg_match PHP	8	558	December 4, 2010

Preg_match issues- expression works on some servers but not others

Related topics