Basically the difference is that levenshtein returns the sum of the number of characters that need to be 1/ added, 2/ replaced or 3/ removed to get from one string to another. So for example the difference between ScallioXTX and Scylio is 5. 3 characters removed (XTX), 1 replaced (y instead of a), and 1 removed (1 of the Ls) = 5.
similar_text however returns how many of the characters that are in the first string are also in the second string, according to the manual, but I can't quite reproduce this.
For example
PHP Code:
<?php
$alpha = 'abcdefghijklmnopqrstuvwxyz';
$st = similar_text(strrev($alpha), $alpha, $perc);
var_dump($st); // 1
var_dump($perc); // 3.8461538461538
php seems to think those strings have 1 character in common, while in fact they have no characters in common whatsoever...
Also, similar_text($a, $b) != similar_text($b, $a)
PHP Code:
<?php
$str1 = 'PHP IS GREAT';
$str2 = 'WITH MYSQLAS';
$st = similar_text($str1, $str2, $perc);
var_dump($st); // 4
var_dump($perc); // 33.333333333333
$st = similar_text($str2, $str1, $perc);
var_dump($st); // 2
var_dump($perc); // 16.666666666667
Which seems weird to me. The number of characters they have in common should not be different when you swap the parameters IMO, especially when the strings have the same length and the difference can not be attributed to that.
Beside that similar_text is a lot slower than levenhstein (O(max(n,m)^3 vs O(n*m) -- where n is the length of the first string and m is the length of the second string).
Long story short, unless you have a very specific usecase where you are sure that similar_text does exactly what you need, it's probably advisable to go with levenhstein instead, which is a faster and more predictable.
Bookmarks