SitePoint Sponsor

User Tag List

Results 1 to 6 of 6
  1. #1
    SitePoint Member
    Join Date
    Sep 2006
    Posts
    17
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Question string comparison with percent match {RESOLVED}

    Hi,
    I'm trying to figure out how to compare a text string from a form input to a text string in the database. If there is a match > than 75% I need it to throw an error. The 75% is a variable that I can change later down the line.

    Say, I have 10,000 rows and I can narrow this down to 2,000, that would still be quite some rows to compare. The text string could be as short as 20 words, but as long as 250.

    I have tried using LIKE '%$string%' in the query, but that would only work if the strings are 100% the same.

    So, here is an idea
    PHP Code:
    $i 0;
    $j 0;
    $k 0;
    $percent 75;
    $cmpstr explode(" ",$form['input']); // put the input string into an array
    $sql "SELECT rowstr FROM table'"// additional WHERE to narrow down, but not important to create this example
    $res mysql_query($sql) or die ($sql.mysql_error());
    $rows mysql_num_rows($res);

    while(
    $row mysql_fetch_assoc($res)){
        
    $rowstr[$k] = explode(" "$row['rowstr']);
        ++
    $k;
        }

    for(
    $i<=$rows;++$i){
        
    // ............
        // I'm stuck here.
        // Here is where the 2 strings need to be compared
        // How to go about the comparison?
        // And how to get matches?
        // ............

        // confused here too, to select the match
        
    if(match_is_found// if there is a match found $j gets a count
            
    ++$j;
        }

    if(
    $j > = $percent//this would not be exactly true, so another piece of code should be added to create actual percentage out of $j
        // throw error
    else
        
    // continue 
    I don't know if I'm on the right track here and if this is at all possible. With that it could get quite lengthy and taking up lots of resources and time, or not?
    I hope I explained this well enough and that one, or more, PHP Guru's could help me out here
    Last edited by Mobie; Apr 27, 2008 at 18:58. Reason: Resolved

  2. #2
    SitePoint Wizard silver trophybronze trophy Cups's Avatar
    Join Date
    Oct 2006
    Location
    France, deep rural.
    Posts
    6,869
    Mentioned
    17 Post(s)
    Tagged
    1 Thread(s)
    I'm no guru, but this sounds more of a mysql question, and I'm even less of a mysql guru.

    Enough chest beating. Have you looked into using mysqls FULLTEXT index searching - very powerful. Google for a fulltext tutorial, there are loads.

    It rejects as a failed search anything which brings back more than 50&#37; of the total possible results - you might be able to change that, but I cant see why anyone would want to page through 50,000 result sets.

  3. #3
    SitePoint Evangelist AlienDev's Avatar
    Join Date
    Feb 2007
    Location
    UK
    Posts
    591
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Not 100&#37; sure what you want, but heres something I just did quickly.

    PHP Code:
    <?php

    // String to search
    $searchIn 'this is an example';
    // Data to be searched for
    $searchFor 'this';

    // Remove punctuation etc from strings, count number of words
        // String to search
        
    $searchIn preg_replace('~[^a-zA-Z0-9]~'' '$searchIn);
        
    $searchIn preg_replace('~\s+~'' '$searchIn);
        
    $searchInCount count(explode(' '$searchIn));
        
    // Data to be searched for
        
    $searchFor preg_replace('~[^a-zA-Z0-9]~'' '$searchFor);
        
    $searchFor preg_replace('~\s+~'' '$searchFor);
        
    $searchFor explode(' '$searchFor);
        
    $searchForCount count($searchFor);

    // Regex to try match
    $regex '~(';

    // Add each word to regex
    for ($i 0$i $searchForCount$i++)
    {
        
    $regex .= $searchFor[$i] . ($i != ($searchForCount 1) ? '|' '');
    }

    // Finish regex, case insensitive
    $regex .= ')~i';

    // Count number of words found
    $wordsFound preg_replace($regex'-'$searchIn);
    $wordsFound preg_replace('~[^-]~'''$wordsFound);
    $wordsFoundCount strlen($wordsFound);

    $percent = ($wordsFoundCount) / ($searchInCount) * 100;

    echo 
    '<b>'$percent'%</b>';
    ?>
    Me on StackOverflow | Blog & personal website.

    I mostly use: PHP, Java, JavaScript, Android.

  4. #4
    SitePoint Member
    Join Date
    Sep 2006
    Posts
    17
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Resolved!

    Thank you both for your suggestions. I researched the option of FULLTEXT matching and went with that. So far it is working great!

    For testing purposes I'll go ahead and give AlienDev's suggestion a try in another file.

    Thanks again guys!!

  5. #5
    reads the ********* Crier silver trophybronze trophy longneck's Avatar
    Join Date
    Feb 2004
    Location
    Tampa, FL (US)
    Posts
    9,854
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    another thing to look at is levenshtein distance. there's a great UDF for mysql that lets you calculate the levenshtein distance between two strings in a query. this is the same function that many spell checkers are based on.
    Check out our new Industry News forum!
    Keep up-to-date with the latest SP news in the Community Crier

    I edit the SitePoint Podcast

  6. #6
    Sesame Street Iimitk's Avatar
    Join Date
    Feb 2006
    Posts
    662
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    As longneck has stated above, the Levenshtein distance is great for comparing two strings. You can make good use of the levenshtein() function in PHP as well.
    Imagination is more important than knowledge. - Einstein


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •