Trying to find a swear filter for Company site

Hello all,

I’m really new to this and got hired at a new job. Thing is it’s an eCommerce site and I’m not familiar with this at all. One of the other things they use is PHP/Mysql which I am also not familiar with. I told them I’m not too great at this, so they know, but still ended up hiring me. I was trying to figure out how to even begin to find some sort of swear filter or code I could use to prevent customers from swearing in a comments section. It’s a section that a customer can fill out to rate t-shirts. Any direction of where to go on this would be greatly appreciated. HELP! I’d like to keep this job.

assuming you’ll be using php, this demo class should be close to what you are after.

The BadWordDetector class replaces any bad words in a string with ****.

In real life you could retrieve the bad words from a database table to populate the badwords array that is inputed to the class.

feel free to modify the code to suit your own needs.

 
<?php
class BadWordDetector {
// Properties
protected $badwords = array();
/***********************************************************************
Class Constructor
***********************************************************************/
public function __construct($badwords) {
//convert $badwords values to lower case for case-insensitive comparisons
foreach($badwords as $i => $value) {
$this->badwords[$i] = strtolower($value);
}
}
/***********************************************************************
Class Accessor Methods
***********************************************************************/
public function getBadWords() {
return $this->badwords;
}
// function to reset the $badwords array to new values
public function setBadWords($badWords) {
//convert $badwords values to lower case for case-insensitive comparisons
foreach($badwords as $i => $value) {
$this->badwords[$i] = strtolower($value);
}
}
/***********************************************************************
Class Methods
***********************************************************************/
public function cleanString($str) {
$msgWords = explode(' ',$str);
//split the elements with punct. characters and expand $msgWords array
$msgWordsExpand = array();
foreach($msgWords as $i => $currWord) {
if(!eregi("([[:punct:]])$",$currWord)) {
$msgWordsExpand[] = ' '.$currWord;
} else { //there is a punctuation mark at the end of the word, so split it out
$str1 = substr($currWord,0,strlen($currWord)-1);
$str2 = substr($currWord,strlen($currWord)-1);
array_push($msgWordsExpand,' '.$str1,$str2);
}
}
//create a clean string from $msgWordsExpand
$strClean = '';
foreach($msgWordsExpand as $i => $currWord) {
$strClean = in_array(trim(strtolower($currWord)),$this->badwords)? $strClean.' ****': $strClean.$msgWordsExpand[$i];
}
return $strClean;
}
} //end of class
 
//------------------------------------------------------------
//testing code
 
$badwords = array('Amet','tinCidunt','congue');
 
$message = 'Lorem ipsum dolor sit ameT, consectetur adipiscing elit. Mauris tincidunt auctor ligula, sed conGue mauris gravida gravida. ';
 
$detector = new BadWordDetector($badwords);
 
echo $message.'<br /><br />'.$detector->cleanString($message);
 
?>

Thanks. Does this automatically detect the words or do I put them into the code myself? Also, I’m not really sure where to put it in the document. Thank you. Very much appreciated.


[I][COLOR=#ff9900][COLOR=#ff9900][I]//testing code[/I][/COLOR][/COLOR][/I]
 
[COLOR=#ff9900][I][COLOR=#0000cc]$badwords[/COLOR] = [COLOR=#000066]array[/COLOR][COLOR=#66cc66]([/COLOR][COLOR=#cc0000]'Amet'[/COLOR],[COLOR=#cc0000]'tinCidunt'[/COLOR],[COLOR=#cc0000]'congue'[/COLOR][COLOR=#66cc66])[/COLOR];[/I]
 
[I][COLOR=#0000cc]$message[/COLOR] = [COLOR=#cc0000]'Lorem ipsum dolor sit ameT, consectetur adipiscing elit. Mauris tincidunt auctor ligula, sed conGue mauris gravida gravida. '[/COLOR];[/I]
 
[I][COLOR=#0000cc]$detector[/COLOR] = [COLOR=#000000][B]new[/B][/COLOR] BadWordDetector[COLOR=#66cc66]([/COLOR][COLOR=#0000cc]$badwords[/COLOR][COLOR=#66cc66])[/COLOR];[/I]
 
[I][COLOR=#000066]echo[/COLOR] [COLOR=#0000cc]$message[/COLOR].[COLOR=#cc0000]'<br /><br />'[/COLOR].[COLOR=#0000cc]$detector[/COLOR]->[COLOR=#006600]cleanString[/COLOR][COLOR=#66cc66]([/COLOR][COLOR=#0000cc]$message[/COLOR][COLOR=#66cc66])[/COLOR];[/I]
 
[/COLOR]

$badwords is the array that contains the swear words you want to filter out. This array could be hard coded into your php script using an [fphp]include[/fphp] file or populated from records of swear words stored in your database. That is your call.

$message is essentially what your user enters in the comments box in your html form. Your php script will receive comments in either a $_POST or $_GET array.

when you create an instance of the class with the new keyword, the badwords array is passed to the class.

the cleanString method in the class receives $message and replaces any swear words in $message with the string ‘****’. The clean string is then returned to the calling statement in your php script.

The cleaned string, after being sanitised with [fphp]mysql_real_escape_string[/fphp], can then be inputed into a database if you like.

If the comment is not going to be stored in a database then you don’t need to sanitise the cleaned string.

This is a slightly better version of the class.

The original had a logic error in setBadWords()

 
<?php
class BadWordDetector {
// Properties
protected $badwords = array();
/***********************************************************************
Class Constructor
***********************************************************************/
public function __construct($badwords) {
//convert $badwords values to lower case for case-insensitive comparisons
$this->setBadWords($badwords);
}
/***********************************************************************
Class Accessor Methods
***********************************************************************/
public function getBadWords() {
return $this->badwords;
}
// function to reset the $badwords array to new values
public function setBadWords($badwords) {
foreach($this->badwords as $i => $value) {
unset($this->badwords[$i]);
}
//convert $badwords values to lower case for case-insensitive comparisons
foreach($badwords as $i => $value) {
$this->badwords[$i] = strtolower($value);
}
}
/***********************************************************************
Class Methods
***********************************************************************/
public function cleanString($str) {
$msgWords = explode(' ',$str);
//split the elements with punct. characters and expand $msgWords array
$msgWordsExpand = array();
foreach($msgWords as $i => $currWord) {
if(!eregi("([[:punct:]])$",$currWord)) {
$msgWordsExpand[] = ' '.$currWord;
} else { //there is a punctuation mark at the end of the word, so split it out
$str1 = substr($currWord,0,strlen($currWord)-1);
$str2 = substr($currWord,strlen($currWord)-1);
array_push($msgWordsExpand,' '.$str1,$str2);
}
}
//create a clean string from $msgWordsExpand
$strClean = '';
foreach($msgWordsExpand as $i => $currWord) {
$strClean = in_array(trim(strtolower($currWord)),$this->badwords)? $strClean.' ****': $strClean.$msgWordsExpand[$i];
}
return $strClean;
}
} //end of class
//------------------------------------------------------------
//testing code
 
$badwords = array('Amet','tinCidunt','congue');
 
$message = 'Lorem ipsum dolor sit ameT, consectetur adipiscing elit. Mauris tincidunt auctor ligula, sed conGue mauris gravida gravida. ';
 
$detector = new BadWordDetector($badwords);
echo '<br /><br />'.$message.'<br />'.$detector->cleanString($message);
 
//change the set of bad words
$badwords = array('Amet');
 
$detector->setBadWords($badwords);
 
echo '<br /><br />'.$message.'<br />'.$detector->cleanString($message);
 

There are a few profanity filter web services out there now. I found it much faster to implement and keep on top of my white and blacklists using something like webpurify <snip/>

So, I’ve made a separate file with all the code and labeled it swearfilter.php. I have attached this php file to the comments section. My only question now is do I replace the $message with something like ‘****’ rather than what is there already…the ‘Lorem Ipsem etc…’

Also, would I have to replace where it says “//change the set of bad words” at the bottom with the words in my array again or is the text from “//testing code” on only there to be put in my php file with the comments section? Thanks for all your help. I’m just starting to learn php, obviously. These forums offer a lot of help.

ok, I’m not sure exactly what you have done but let me describe what needs to be done.

The code I posted is a demo class and example of how the class can be used. So to implement it in a real situation:

  1. copy and paste from the top of my code to //end of class into a file called BadWordDetector.php (or whatever)

  2. I asume your comments are inputed by users in a html form using either a textbox or <textarea>. let’s say this text input has an id=“txtComments”

  3. when the user clicks the submit button on the form all the form data, including the comments, will be sent to a php form processing script. let’s call it formProcessor.php

from here on all the following steps are done in formProcessor.php

  1. formProcessor.php will receive the comments in $_POST[‘txtComments’] assuming your form’s action = “post”

  2. in formProcessor.php you need to first [fphp]include[/fphp] BadWordDetector.php so the script can access the class

  3. assign $_POST[‘txtComments’] to a variable, say $message.

  4. now we can clean $message, but first we need to get the list of “bad words” either from the database or hard coded in an array. let’s assume you have an array of bad words.

[COLOR=#0000cc]

 [COLOR=#0000cc]$badwords[/COLOR] = [COLOR=#000066]array[/COLOR][COLOR=#66cc66]([/COLOR][COLOR=#cc0000]'Amet'[/COLOR],[COLOR=#cc0000]'tinCidunt'[/COLOR],[COLOR=#cc0000]'congue'[/COLOR][COLOR=#66cc66])[/COLOR];

[/COLOR]

  1. create a new instance of the BadWordDetector class
 
[COLOR=#0000cc]$detector[/COLOR] = [COLOR=#000000][B]new[/B][/COLOR] BadWordDetector[COLOR=#66cc66]([/COLOR][COLOR=#0000cc]$badwords[/COLOR][COLOR=#66cc66])[/COLOR];

this creates a new instance of the class and passes to it your array of bad words which it will use to clean any string you subsequently pass to the class

  1. now input the comments ($message) to the class and clean it
 
[COLOR=#0000cc]$cleanStr = $detector[/COLOR]->[COLOR=#006600]cleanString[/COLOR][COLOR=#66cc66]([/COLOR][COLOR=#0000cc]$message[/COLOR][COLOR=#66cc66])[/COLOR];

this passes $message to the cleanString method in BadWordDetector which then uses its stored badwords array to replace any bad words in $message with ****.
The method then returns the cleaned string back to $cleanStr.

  1. $cleanStr now contains your cleaned $message (comments)

  2. you can then echo out $message and $cleanStr to see the ‘before’ and ‘after’ comments like I did in the demo code I posted.

  3. now that you have $cleanStr you can do whatever you need to do with it. but if you’re going to store it in a database, make sure you sanitise it first with [fphp]mysql_real_escape_string[/fphp]

I hope the above makes sense. if you have any more questions, simply post back.

Yeah, if that is what I’m supposed to do then I was way off. I’ll try what you said to do now. I don’t quite understand how to sanitise it first with mysql_real_escape_string, but I’ll read up on that. I’m also using xcart.
What I did was create a new php form, called it swearwordfilter.php, and pasted your code (up to //end of class). Then I attached code into my other php file which is called import.php where the comments section resides like this (I changed the bad words for this example) I also typed :
include (“swearwordfilter.php”); on top of the import.php:

	# Import customer review
	foreach ($row['message'] as $k =&gt; $v) {
		$data = array(
			"productid"	=&gt; $row['productid'],
			"message"	=&gt; $v,
			"email"		=&gt; $row['email'][$k]
		);

		$data = func_addslashes($data);
		func_array2insert("product_reviews", $data);
		$result["customer_reviews"]["added"]++;
		
		$badwords = array('sczxt','fxzk','sdftch','fcving','acjkle');
	        $detector = new BadWordDetector($badwords);
	        $data = $detector-&gt;cleanString($message);

	}

I’ll try going over your example now.

Kalon, I can’t that you enough for your help. I finally got it.

you’re welcome :slight_smile: