I’m trying to locate one or more occurrences of ‘http’ in a string and then eliminate the entire URL (later I’ll extend this to other sub-strings).
I have tried:
// Find and remove substring 'http...'
while (stripos($message, "http")) {
...
}
This finds ‘http’ provided ‘http’ doesn’t occur right at the start of the string, at which point stripos will return 0 (zero).
I think WHILE must be treating zero as FALSE. Is there a way round this ?
Thank you both. I had read the manual, and I had tried something similar to what you suggest, but it hadn’t worked (probably I didn’t have the syntax quite right).
Eventually I used stristr in the while loop as it returns a string (haystack from needle to end), but now I can look at that again.
Thanks. That is the method I’d tried earlier. I can see now that I didn’t have enough brackets.
Currently I’ve got:
// Find and remove bad substrings '$badstrs'
foreach ($badstrs as $badstr) {
while (stripos($message, $badstr) !== false) {
$pos = stripos($message, $badstr);
// Find posn of the next space
.....
}
}
but I could now save a line.
This allows me to define the ‘bad’ strings in an array to include anything I (don’t) fancy.
Fair comment, but there’s a good reason.
Initially I set out to remove URLs, which I could define as atarting with ‘http’ or ‘www’ (which become my ‘bad’ strings), but I wouldn’t know what or where the end was. Assuming they extend to the next space seemed easier to do than a regex to find the TLD suffix. There shouldn’t be a space in a URL, and if there is the first part will still be removed, thereby sanitising it. It works equally well for complete words.
As an afterthought, it could also be applied to dealing with plurals as well as singulars, and (for most verbs) participles too, without having to define both. Thus ‘blog’ would include ‘blogs’, ‘blogging’, ‘blogged’ etc. But that wan’t the original aim.
Here’s the complete code segment:
foreach ($badstrs as $badstr) {
while (($pos = stripos($message, $badstr)) !== false) {
// Find posn of the next space
$strchars = strlen($message);
$endstr = strpos($message, " ", $pos);
if ($endstr < 1) $endstr = $strchars;
$len = $endstr - $pos;
$message = substr_replace($message, '', $pos, $len);
$errors[] = $badstr . " found";
$_POST['message'] = $message;
}
}
The line if ($endstr < 1) $endstr = $strchars; takes care of the URL being right at the end of the message and no space being found. As you can see, the errors are accumulated, and the message edited for re-presentation.
The usage is in an e-mail form, and is part of the ‘sanitisation’. I was asked to disallow URLs. If a URL is found, it’s excised, and the text re-presented so the web site visitor can amend what she’s said and re-submit (when, of course, it gets checked again).