Similar_Text() and levenshtein() functions

Roaya_Shehab · October 11, 2014, 6:39am

Hi,
I am creating a websites to find similar texts … I used both Similar_Text() and levenshtein() functions.
I understand how those functions work … and I almost finished the website.
but my problem is how to compare between those functions mathematically. because I am supposed to attach a report which has definition and comparison for those functions.

In this post PHP Function of the Week - similar text
@rpkamp mentioned this " Beside that similar_text is a lot slower than levenhstein (O(max(n,m)^3 vs O(nm) – where n is the length of the first string and m is the length of the second string). "
I really want resources to explain the part of (O(max(n,m)^3 vs O(nm) more.

please advice

Thank you

rpkamp · October 11, 2014, 11:02am

The complexity of similar_text is given in the PHP manual: http://php.net/manual/en/function.similar-text.php and is defined as O(N^3), where N is the length of the longer of the input string, so O(max(n, m)^3), where n is the length of string 1 and m is the length of string 2. I don’t know where this complexity comes from, but you can probably find in the book the manual refers to, Programming Classics: Implementing the World’s Best Algorithms by Oliver (ISBN 0-131-00413-1).

Levensteihn uses a matrix of the letters of both strings and worst case all cells have to be visited, so it’s a O(n*m) algorithm.

Note that even though the Big O of similar_text is a lot larger than that of levenstein, in practice this doesn’t really matter for small inputs. If you compare 2 strings of length 10 for example you really wouldn’t notice the difference between the two. But the larger the input gets the more the difference in speed becomes apparent.

Also note that this Big O notation says nothing how the algorithms work, but only how fast. If you want a nice introduction to the Big O notation read the first answer to this question on SO here: http://stackoverflow.com/questions/487258/plain-english-explanation-of-big-o

Finally it should be mentioned that there are more algorithms than just these 2 to compare similarity between strings. One that springs to mind is trigram matching. So you may want to look in to different algorithms as well.

Roaya_Shehab · October 11, 2014, 3:40pm

ScallioXTX , many thanks. it was really helpful : ))

Topic		Replies	Views
PHP Function of the Week - similar text PHP	4	2919	October 8, 2014
Identify extremely similar rows Databases	11	4436	March 15, 2011
Optimizing a search query Databases	2	419	June 20, 2011
Comparing Crawled Page Text PHP	16	1150	April 29, 2011
Apostrophes producing strange results with similar_text PHP	5	816	June 14, 2015

Similar_Text() and levenshtein() functions

Related topics