Getting a list of what was replaced

You’re really beating me up on a bad day… :frowning:

And I’m not so successfully trying to figure that out.

Based on the link you provided, most people are saying implode the array of patterns into one long pattern string.

But what if you have 100,000 words in your array?

And what if you have funky characters like “|”?

Take a number! :smile:

You are asking about number 5 but skipping 3?

I am asking about a PHP problem…

At some later date when I have time, I will try and pick up JavaScript. But considering how I struggle with PHP, probably better to master that first.

Exactly my point

If you can’t understand JavaScript you won’t understand PHP

Let’s stay on topic, huh? Telling me that my problem with PHP is that I don’t understand JavaScript is nonsense.

I don’t know JavaScript - life goes on!

And sitting down and learning JavaScript for 3 months is not the way you solve PHP issues?! :unamused:

It is on-topic
Do you think members are here to do your code for you when you apparently seem to not want to learn basics?

But you are wrong. They are both “C” languages and learning basic JavaScript will immensely help you learn PHP

Let alone that you are not asking about PHP but regex which is a whole other thing in itself.

IMHO if you are good at HTML and CSS you are ahead of many, leave the programming to devs. there is certainly no shame in being a designer

Untested, so there may be an error or two (but it should be darn close) :wink:

$matches = array();
$total_matches = array();

foreach ($pattern in $badWordsArray) {
  preg_match_all($pattern, $text, $matches);
  $total_matches = array_merge($total_matches, $matches[0]); // this may need tweaking based on your bad word patterns
}

// $total_matches has all of the words that were matched.

Now, one thing to note, it will include each and every match! So if they used the same bad word 4 times, you will have that 4 times in your $total_matches. You can get down to the unique words by using $unique_matches = array_unique($total_matches);

@mikey_w, did my code example help?

@cpradio,

Nope, I have been getting my butt kicked all day long… :worried: (Even before Mittineague started pouncing on me!)

Here is a test script that I created based on what I had and your contributions…

<?php
    $text = "That darn pigeon just took a crap on my glasses!";
    $text .= "I should shoot that little mothertrucker in the arse!";

    $badWordsArray = array();
    $badWordsArray[0] = 'darn'; 
    $badWordsArray[1] = 'crap';
    $badWordsArray[2] = 'mothertrucker';
    $badWordsArray[3] = 'arse';

    $replacementArray = array();
    $replacementArray[0] = '****';
    $replacementArray[1] = '****';
    $replacementArray[2] = '*************';
    $replacementArray[3] = '****';


    $matches = array();
    $total_matches = array();

    foreach ($badWordsArray as $pattern){
        preg_match_all($pattern, $text, $matches, PREG_PATTERN_ORDER);
        
        $total_matches = array_merge($total_matches, $matches[0]);
    }
    
?>

When I run that I get the following errors…

Warning: preg_match_all(): Delimiter must not be alphanumeric or backslash in /Users/user1/Documents…

Notice: Undefined offset: 0 in /Users/user1/Documents…

Warning: array_merge(): Argument #2 is not an array in /Users/user1/Documents…

Ah, none of your patterns have delimiters… (not sure how your code works with preg_replace_all…

    $text = "That darn pigeon just took a crap on my glasses!";
    $text .= "I should shoot that little mothertrucker in the arse! darn it!";

    $badWordsArray = array();
    $badWordsArray[0] = 'darn'; 
    $badWordsArray[1] = 'crap';
    $badWordsArray[2] = 'mothertrucker';
    $badWordsArray[3] = 'arse';

    $replacementArray = array();
    $replacementArray[0] = '****';
    $replacementArray[1] = '****';
    $replacementArray[2] = '*************';
    $replacementArray[3] = '****';


    $matches = array();
    $total_matches = array();

    foreach ($badWordsArray as $pattern){
        preg_match_all("/{$pattern}/i", $text, $matches, PREG_PATTERN_ORDER);
        
        $total_matches = array_merge($total_matches, $matches[0]);
    }

var_dump($total_matches);

There are a few problems with auto masking: Charles Dickens, shitake mushrooms, Dick Cheney, the University of South Carolina Gamecocks, “cock the gun.” I could go on of course but I think that makes my point - the stupid things just become a thorn in the side of normal conversation and the kids with potty mouths will use !33+ to bypass the filters. Also, you can gravely insult someone without cursing at all if you know how to express yourself.

Yeah, you should really use boundary detection in your patterns too, so that Dickens doesn’t become ****ens

boundary detection won’t help with former Vice President Cheney’s first name though. Also, boundary detection let’s in curse verbs in different tenses and plurals.

@Michael_Morris,

Well, since you brought it up, my original code had a more sophisticated approach, but I created a simplified example to try and get @cpradio suggestions working…

Last time I ran a forum I had the system notify the moderators when a message had a hit but it didn’t actually do anything to the message. Still, no matter how sophisticated the approach things will get through and things you don’t want masked will be. It’s unavoidable.

My goal is to catch obvious things like F-bombs.

In the approach I had working last night, I have a database table with words which are marked to denote if they should be replaced if they are a “substring”.

So if I see the word “f*ck” then I replace anywhere any time. But for “ass”, it has to be a standalone word so that “glasses” is left alone.

I think my approach will catch 80% of the issues.

Now I just need to get cpradio’s code working and understand it!

// 16 days ago
Catching Bad Words

// 3 days ago
Converting M-U-C-K to muck, but not

// latest
http://www.sitepoint.com/community/t/getting-a-list-of-what-was-replaced/115864

@mike_w
May I suggest that your efforts will never ever be absolutely resolved because the internet is forever being updated and defeats even knowledgeable programmers.

According to your previous posts you already only allow registered users to post and have also adopted the SitePoint’s approach to allow users to flag posts.

To reiterate, perfect Internet security will never be achieved and it is best to adopt a simple solution such as to only allow approved posts to be published.

An ongoing solution would be create a http://test-bad-words.Your-Clients-Site.com and request SitePoint members to try and defeat your latest code.

My opinion is to publish your site and try to encourage a literate community. There will be far more important problems to resolve once your site is live.

@John_Betong,

Someone is keeping a dossier on me!! :sunglasses:

Like is usually the case, my struggles to get things done are a function of my coding weaknesses - just ask @Mittineague how lowly I am!!!

Devising a decent bad word strategy hasn’t taken me long and I had it working last night - until I decided to expand it and make things better! :blush:

I have to leave for tonight, but will hopefully get @cpradio suggestions working in the morning. (That and I found another way to do things per @cpradio suggestions earlier, and would like to better understand those as well.)

It’s all a good learning experience, before I turn things over to the wolves!!!

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.