SitePoint Sponsor

User Tag List

Results 1 to 9 of 9
  1. #1
    SitePoint Wizard
    Join Date
    Nov 2003
    Location
    United Kingdom
    Posts
    2,120
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Help with my Stop words script, Please

    Hi,

    I have some spam submissions happening on my site and I want them to stop. To do this I have decided to check for words in a string and if they match then it will not submit that data. The thing is I am not quite sure how to put this together.

    I think I have the structure write or nearly write, but then using the functions to make it all work is a bit dazling to me.

    PHP Code:
    $stopwords "games,gambling,about";

    $text "nothing about it most that most don't know which ones don and just discovered normally thinks they can make quick buck two well's not";

    $wordex preg_match();

    if(
    $wordex)

        {

    echo 
    "Spam";

    }else{

    echo 
    "No Spam";

        } 
    What I want in the above script is for it to loop through the $stopwords list while trying to match the words with the $text string. If there is a match with any words in the $stopwords list then it will echo "Spam" and if there is no match it will echo "No Spam".

    How can I modify my script to do that.

    Thanks!

  2. #2
    $books++ == true matsko's Avatar
    Join Date
    Sep 2004
    Location
    Toronto
    Posts
    795
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Arrow

    so let me get this straight, you want to create a script that involves a list of keywords (in a string and separated by commas) and then you want them to be compared to an inputted text to see if the text contains any of them...

    Well you could do this...

    PHP Code:
    $words "one,two,three,four,five,six";

    //Convert Keywords to array
    $words explode(',',$words);

    //Spam Bool
    $spam false;

    //Check Words
    foreach($words as $w)
    {

      if(
    stripos(&$text,$w)!=-1)
      { 
        
    $spam true;
        break;
      }

    }

    //Check Bool
    if($spam==true)
    echo 
    'spam'
    Adding URL websites for keywords would be more effective then just keywords...

    When it comes down to preventing spam on a website, a simple keyword system wont do the job. You see having text like G.A.M.E.S or game(s) wont be considered spam (since its not within the keyword list). Also adding more keywords will only slow down the website. So to prevent spam you need to take a look at what kind of messages your spam users are sending.

    Most of the time when the average spam user sends a message he or she will send the same message to multiple users, or maybe just alter a few things in the message (name, user, etc...). By using strcmp() you can compare the templated spam message (the old message) to the message that was inputted. By setting a ballpark percentage (say 80&#37 then if the two messages are greater than 80% the same then they will be considered spam. So in order to do this, make a database system that flags the id of the last message that was sent by the user. Then once he/she sends another message it will compare the old message (fetched from the database via the flagged id) with the new inputted message. If both messages are above 80% (or any percentage that you specify) then it will be the same message or almost the same message. Then you know its SPAM.
    I can't believe I ate the whole thing

  3. #3
    SitePoint Wizard
    Join Date
    Nov 2003
    Location
    United Kingdom
    Posts
    2,120
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Yep, that is what I want. I have tried your script and even though the words don't match it still says "spam".

    PHP Code:
    $text "nothing about it most that most don't know which ones don and just discovered normally thinks they can make quick buck two well's not";

    $words "games,dinner,whats to be or not to be";

    //Convert Keywords to array
    $words explode(',',$words);

    //Spam Bool
    $spam false;

    //Check Words
    foreach($words as $w)
    {

      if(
    stripos(&$text,$w)!=-1)
      {
        
    $spam true;
        break;
      }

    }

    //Check Bool
    if($spam==true){
    echo 
    'spam'

    Hope you can fix it.

    Thanks!

  4. #4
    $books++ == true matsko's Avatar
    Join Date
    Sep 2004
    Location
    Toronto
    Posts
    795
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Yeah the stripos returns false if nothing is found. So replace...

    PHP Code:
    if(stripos($text,$w)!=-1
    with

    PHP Code:
    if(is_numeric($text,$w)) 
    HOWEVER, read my first post about what I said after the code...
    I can't believe I ate the whole thing

  5. #5
    SitePoint Wizard
    Join Date
    Nov 2003
    Location
    United Kingdom
    Posts
    2,120
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    It is now saying that I have the wrong parameter count for is_numeric(), which I think is correct as I think you can only use 1.

    Also, I want to use this script to stop articles going into my site that are about adult related stuff so that why I mainly want to use keywords. If I have to also use urls, then I will as it should also accept urls in the string that will match the urls in any of the articles description.

  6. #6
    $books++ == true matsko's Avatar
    Join Date
    Sep 2004
    Location
    Toronto
    Posts
    795
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Oh sorry my mistake...

    PHP Code:

    if(is_numeric(stripos($text,$w))) 
    Yeah I was mearly refering to the text spam filter system. However keywords like G.A.M.E.S. and so on will still be considered not spam...
    I can't believe I ate the whole thing

  7. #7
    SitePoint Guru
    Join Date
    Jul 2005
    Location
    Orlando
    Posts
    634
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by matsko View Post
    Oh sorry my mistake...

    PHP Code:

    if(is_numeric(stripos($text,$w))) 
    Yeah I was mearly refering to the text spam filter system. However keywords like G.A.M.E.S. and so on will still be considered not spam...
    This solution will work, but it's not the optimal solution.

    stripos will return boolean false when it doesn't find it.

    So you should check for:

    PHP Code:
    stripos( ... ) !== false 

  8. #8
    SitePoint Wizard
    Join Date
    Nov 2003
    Location
    United Kingdom
    Posts
    2,120
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks, that works just fine.

  9. #9
    Fully Sweet Car noddy's Avatar
    Join Date
    Aug 2002
    Location
    Perth, Western Australia
    Posts
    759
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Another way that may have worked and been easier possibly would be to have the words in an array. Then use the in_array function to then check if if the words you are looking for exist. However say you were searching through a lot of text this might not be very fast as it would check each submitted word against the list of banned words. This forum has a feature that does this task.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •