Similar_Text() and levenshtein() functions

Hi,
I am creating a websites to find similar texts … I used both Similar_Text() and levenshtein() functions.
I understand how those functions work … and I almost finished the website.
but my problem is how to compare between those functions mathematically. because I am supposed to attach a report which has definition and comparison for those functions.

In this post PHP Function of the Week - similar text
@rpkamp mentioned this " Beside that similar_text is a lot slower than levenhstein (O(max(n,m)^3 vs O(nm) – where n is the length of the first string and m is the length of the second string). "
I really want resources to explain the part of (O(max(n,m)^3 vs O(n
m) more.

please advice

Thank you

The complexity of similar_text is given in the PHP manual: http://php.net/manual/en/function.similar-text.php and is defined as O(N^3), where N is the length of the longer of the input string, so O(max(n, m)^3), where n is the length of string 1 and m is the length of string 2. I don’t know where this complexity comes from, but you can probably find in the book the manual refers to, Programming Classics: Implementing the World’s Best Algorithms by Oliver (ISBN 0-131-00413-1).

Levensteihn uses a matrix of the letters of both strings and worst case all cells have to be visited, so it’s a O(n*m) algorithm.

Note that even though the Big O of similar_text is a lot larger than that of levenstein, in practice this doesn’t really matter for small inputs. If you compare 2 strings of length 10 for example you really wouldn’t notice the difference between the two. But the larger the input gets the more the difference in speed becomes apparent.

Also note that this Big O notation says nothing how the algorithms work, but only how fast. If you want a nice introduction to the Big O notation read the first answer to this question on SO here: http://stackoverflow.com/questions/487258/plain-english-explanation-of-big-o

Finally it should be mentioned that there are more algorithms than just these 2 to compare similarity between strings. One that springs to mind is trigram matching. So you may want to look in to different algorithms as well.

2 Likes

ScallioXTX , many thanks. it was really helpful : ))