preg_replace problem :(

Hey guys,

Really need some help with preg_replace that is driving me insane.

I have a foreach loop such as this

<?php
foreach ($example as $row){

$id=$row['id'];
$string=$row['string'];
}

I would like to do the following in each loop

<?php
foreach ($example as $row){

$id=$row['id'];
$string=$row['string'];


//$bad_words=array('bad','rotten');

//REPLACE EACH WORD apart from "bad words" INSIDE THE STRING WITH 
<a href="$row['id']">WORD FROM THE STRING</a>

}

Help really appreciated:)

Cheers,

Stuart

Do you mean you want each word to be a link other than the bad words? Strange request but if that’s what you want, here’s an experimental idea:

<?php
$badWords = array('bad', 'rotten');
foreach($example as $row){
    $id = $row['id'];
    $string = $row['string'];

    $LinkOpen = '<a href="?id=' . $id . '">';
    $LinkClose = '</a>';

    $stringReplaced = trim(preg_replace('~([\\.\\,\
\\s])(' . implode('|', $badWords) . ')([\\.\\,\
\\s])~', $LinkClose . '$1$2$3' . $LinkOpen, $string));

    echo $LinkOpen, $stringReplaced, $LinkClose;
}

That basically closes the link before any bad word, and opens it up again afterwards.

Jake that code is brilliant, thanks for helping me out, really appreciate it. :slight_smile:

Wondering if you know how I can actually seperate each word as a link?

Also I made a mistake it should have been

$LinkOpen = '<a href="?id=' . [B]WORD FROM THE STRING[/B] . '">';
    $LinkClose = '</a>';

Sorry man, i suck at regex and this is killing me!

Cheers,

Stu

Ok. So preg_match_all would be a better idea here, to separate each word. Then check the word against the list, and output.

Its a little more intricate than that, because you need to take punctuation etc into account. Here goes! :smiley:


$badWords = array('bad', 'rotten');
foreach($example as $row){
    $id = $row['id'];
    $string = $row['string'];
    preg_match_all('/([A-Za-z]+)([\\,\\.\\s\\?]*)/', $String, $Matches);
    foreach($Matches[1] as $Match => $Word){
        if(!in_array($Word, $badWords)){
            printf('<a href="&#37;1$s">%1$s</a>', $Word);
        }else{
            echo $Word;
        }
        echo $Matches[2][$Match];
    }
}

This would be quite the task if the text is large, so I would recommend it only for short strings.

your a legend, cheers jake. great stuff

That’s just it, the OP’s definition of a word might well be different.


$badWords = array('bad', 'rotten');
foreach($example as $row){
    $id = $row['id'];
    $string = $row['string'];
    preg_match_all('/([A-Za-z]+)([\\,\\.\\s\\?]*)/', $String, $Matches);

I assume “$string” should be “$String” in your example here. Typo?

IMO, I think it would still be easier to use the preg_replace() approach. Also, for matching word characters, it’s generally best to use the predefined \w character class, as it is locale-sensitive, so may include other characters like ö, ä, ß, â, é, É, ç, Ç, etc. Also, commas, dots, and question marks, in character classes, should not be escaped.


    foreach($Matches[1] as $Match => $Word){
        if(!in_array($Word, $badWords)){
            printf('<a href="&#37;1$s">%1$s</a>', $Word);
        }else{
            echo $Word;
        }
        echo $Matches[2][$Match];
    }
}

This code fails to output anything for what you don’t consider words.

@StuartC: Perhaps this is desired behavior?

This approach will leave “bad” words in the string, but unlinked (the original post seems to indicate that this is desired behavior).

<?php

$bad = array('bad', 'rotten');
$data = 'Superbad, at the time of writing, has 87% on '
	. 'Rotten Tomatoes. Not bad at all.';

$re = '~
	# Assert word boundaries
	# This word contains letters, numbers,
	# and dashes.
	\\b [\\w-]+ \\b
~ex';
$link = '<a href="?id=%s">%s</a>';
echo preg_replace(
	$re,

	// This version ignores case.
	// Also, we ensure data is escaped.
	'in_array(strtolower("$0"), $bad)
		? htmlspecialchars("$0")
		: sprintf($link, urlencode("$0"), htmlspecialchars("$0"));',

	$data
);

:tup:

I wasn’t aware of the \w class, good to know!

Though whether or not the OP wants to include numbers or dashes is their choice. I also don’t see the need for htmlspecialchars and urlencode though - they have no effect on numbers, letters or dashes.

thanks dyer85, really good stuff aswell. Thanks to the both of you guys for helping me out. Was really in a rut with this.Cheers:)

Indeed, what we consider a word will sometimes vary. You’re right about the escape functions not being necessary here, but the issue might come up with regex changes. Also, I just realized escaping the markup would probably be best done when outputting the string as a whole.

StuartC: good luck, and happy coding. :wink: