Please help!
I wrote a series of scripts to find all combinations of "double letter to single letter" misspelling mistakes. The script was designed to return a concatenated list of these miss typos along with a "project name" and the original keyword appended to each misspelling. For example, if the word "Mississippi" was entered into the script, the following is the expected return.
mississipi test_project_name mississippi
missisippi test_project_name mississippi
missisipi test_project_name mississippi
misissippi test_project_name mississippi
misissipi test_project_name mississippi
misisippi test_project_name mississippi
misisipi test_project_name mississippi
The script works if one word is entered at a a time. However, it goes haywire if more than one word is entered at a time. Here is the original function with strategic "var_dumps" and test data:
This is the contents of the get_combinations function:Code:<?php header('Content-Type: text/plain'); function double_to_single_letter($word, $delimitor = "\r") { echo "<!--start-->********************************************************\r\r"; $new_word_array = array(); //empty array to hold new words or word parts $kw_list = ''; //concatenated keyword lists. sets up for later use $combinations = ''; //sets up for use in combinations function $pattern = "/([a-z])\\1/Uis"; //regex matches double letters $replacement = "$1|$1"; //replaces "|" between double letters for explode $new_word = preg_replace($pattern, $replacement, $word, -1, $match_count); //places "|" between double letters for explode echo "new_word>>>>>\r"; var_dump($new_word ); echo "\r\r"; //use match count to check if word contains double letters if($match_count > 0) { $new_word_exp = explode('|', $new_word); //explodes new word into array $new_word_array[][] = $new_word_exp[0]; //places first section into array $count = count($new_word_exp); //gets size of this array //this loops puts word sections into an array for($i = 1; $i < $count; $i++) { $k = 0; $next_k = $k + 1; //next letter $new_word_array[$i][$k] = $new_word_exp[$i]; //puts word section into array $next_letter = substr($new_word_exp[$i],1); //puts word section into array sans first letter //checks to see if more letters in word section. returns NULL if no letters if($next_letter != FALSE) { $new_word_array[$i][$next_k] = $next_letter; }else { $new_word_array[$i][$next_k] = NULL; } } echo "new_word_array before unset\r"; var_dump($new_word_array); echo "\r\r"; //recursively gets all double letter to single letter misspelling combos. returns a array $combos_all = get_combinations($new_word_array); echo "combos_all\r"; var_dump($combos_all); echo "\r\r"; unset($new_word_array); echo "new_word_array after unset\r"; var_dump($new_word_array); echo "\r\r"; //loops through array to append delimitor foreach($combos_all as $value) { $kw_list .= $value. $delimitor; } }else { $kw_list = NULL; } return $kw_list; } include($_SERVER['DOCUMENT_ROOT'].'get_combinations.php'); $word = "bee honey bee mississippi state aadvark cartoon"; //un commnet for one word //$word = 'mississippi'; function line_split($input) { $word = preg_replace("/(\r|\n)/", "\r", $input); $word = trim(preg_replace("/(\r){2,}/", "\r", $word)); return $word; } $word = line_split($word);// makes all line breaks are the same i.e "\r" echo "word>>>>\r"; var_dump($word); echo "\r\r"; $word_exp = explode("\r", $word); echo "word_exp>>>>\r"; var_dump($word_exp); echo "\r\r"; foreach($word_exp as $value) { $delim = "\t". "test_project_name". "\t". $value. "\r"; echo "value>>>>\r"; var_dump($value); echo "\r\r"; $klist .= double_to_single_letter($value, $delim); } echo $klist; ?>
My expectations is the following:Code:<?php function get_combinations($source_array, $string = '', $i = 0) { global $combinations; if ($i >= count($source_array)) { $combinations[] = $string; }else { foreach ($source_array[$i] as $combos) { get_combinations($source_array, $string.$combos, $i + 1); } } return $combinations; } ?>
But instead the following is returned from the double_to_single_letter function.Code:bee test_project_name bee be test_project_name bee honey bee test_project_name honey bee honey be test_project_name honey bee mississippi state test_project_name mississippi state mississipi state test_project_name mississippi state missisippi state test_project_name mississippi state missisipi state test_project_name mississippi state misissippi state test_project_name mississippi state misissipi state test_project_name mississippi state misisippi state test_project_name mississippi state misisipi state test_project_name mississippi state aadvark cartoon test_project_name aadvark cartoon aadvark carton test_project_name aadvark cartoon advark cartoon test_project_name aadvark cartoon advark carton test_project_name aadvark cartoon
Code:bee test_project_name bee be test_project_name bee bee test_project_name honey bee be test_project_name honey bee honey bee test_project_name honey bee honey be test_project_name honey bee bee test_project_name mississippi state be test_project_name mississippi state honey bee test_project_name mississippi state honey be test_project_name mississippi state mississippi state test_project_name mississippi state mississipi state test_project_name mississippi state missisippi state test_project_name mississippi state missisipi state test_project_name mississippi state misissippi state test_project_name mississippi state misissipi state test_project_name mississippi state misisippi state test_project_name mississippi state misisipi state test_project_name mississippi state bee test_project_name aadvark cartoon be test_project_name aadvark cartoon honey bee test_project_name aadvark cartoon honey be test_project_name aadvark cartoon mississippi state test_project_name aadvark cartoon mississipi state test_project_name aadvark cartoon missisippi state test_project_name aadvark cartoon missisipi state test_project_name aadvark cartoon misissippi state test_project_name aadvark cartoon misissipi state test_project_name aadvark cartoon misisippi state test_project_name aadvark cartoon misisipi state test_project_name aadvark cartoon aadvark cartoon test_project_name aadvark cartoon aadvark carton test_project_name aadvark cartoon advark cartoon test_project_name aadvark cartoon advark carton test_project_name aadvark cartoon
Values are repeated multiple times and the original keyword is appended to the wrong misspelling.
The script was design so that one original word containing a double letter is looped through the double_to_single_letter function at a time.
The line in double_to_single_letter function:verifies that the array $new_word_array contains the word parts of 1 word. Therefore, only that should be sent to the get_combinations function to get misspelling permutations.Code:"new_word_array before unset\r"; var_dump($new_word_array); echo "\r\r";
Yet, as far as i can tell, the cumulative value of $new_word_array consisting of the current added parts of the array, along with its previous indexes get fed into the get_combinations function.
What am I doing wrong?





Bookmarks