SitePoint Sponsor

User Tag List

Results 1 to 2 of 2
  1. #1
    SitePoint Member
    Join Date
    Apr 2010
    Posts
    3
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    How to compare variable against mysql results and return percentange?

    I run an article directory. I was wondering how I could check an article that is currently pending review against published articles in our database. What I am looking to do is to place the content of the body in the pending article into a variable. That much is easy and I know how to do that. Then I would like to use that variable to compare against other content in our database and have it output a percentage of how close it is to any other article already published in our database.

    So basically lets say:


    $body="This is the text in the article that is pending.";

    How could I compare this variable and run an query to check the database for similarity and have it return the highest percent that closely matches it?

    For example: If the above variable is an exact duplicate of an article in my database it should tell me there was a 100% match. Or, if it was close to another article written it may return 35% match.


    Any direction our guidance with this would be so greatly appreciated.

    I found this on this website and it seems like they were trying to do something similar.
    http://www.sitepoint.com/forums/show...-percent-match

  2. #2
    SitePoint Wizard silver trophybronze trophy Cups's Avatar
    Join Date
    Oct 2006
    Location
    France, deep rural.
    Posts
    6,869
    Mentioned
    17 Post(s)
    Tagged
    1 Thread(s)
    Given:

    $body1="This is the text in the article that is pending.";

    $body2="This is the pending text in the article, that is.";

    I think you have to be clear between wanting:

    a) a 100% match, because all the words are the same, albeit in a different order
    and
    b) a straight "diff" operation which tells you char for char which ones need to be altered for body1 to match body2 -- probably only a 30% match -- which is what levenshtein() seems to do.

    If it is a) you want then it appears to involve splitting up the string to words?

    Something must already exist to do this, it sounds as if it would be useful in many scenarios.

    (I'm no greater expert on this than I was in the post you referred to)


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •