SitePoint Sponsor

User Tag List

Results 1 to 2 of 2
  1. #1
    SitePoint Enthusiast
    Join Date
    Aug 2008
    Posts
    62
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Why is my script not working.

    Please help!

    I wrote a series of scripts to find all combinations of "double letter to single letter" misspelling mistakes. The script was designed to return a concatenated list of these miss typos along with a "project name" and the original keyword appended to each misspelling. For example, if the word "Mississippi" was entered into the script, the following is the expected return.

    mississipi test_project_name mississippi
    missisippi test_project_name mississippi
    missisipi test_project_name mississippi
    misissippi test_project_name mississippi
    misissipi test_project_name mississippi
    misisippi test_project_name mississippi
    misisipi test_project_name mississippi

    The script works if one word is entered at a a time. However, it goes haywire if more than one word is entered at a time. Here is the original function with strategic "var_dumps" and test data:

    Code:
    <?php
    header('Content-Type: text/plain');
    
    function double_to_single_letter($word, $delimitor = "\r")
    	{
    		echo "<!--start-->********************************************************\r\r";
    		$new_word_array = array(); //empty array to hold new words or word parts
    
    		$kw_list = ''; //concatenated keyword lists. sets up for later use
    		
    		$combinations = ''; //sets up for use in combinations function
    
    		$pattern = "/([a-z])\\1/Uis"; //regex matches double letters
    
    		$replacement = "$1|$1"; //replaces "|" between double letters for explode
    
    		$new_word = preg_replace($pattern, $replacement, $word, -1, $match_count); //places "|" between double letters for explode
    		
    		echo "new_word>>>>>\r"; var_dump($new_word ); echo "\r\r";
    
    		//use match count to check if word contains double letters
    		if($match_count > 0)
    			{
    				$new_word_exp = explode('|', $new_word); //explodes new word into array
    		
    				$new_word_array[][] = $new_word_exp[0]; //places first section into array
    				
    				$count = count($new_word_exp); //gets size of this array
    				
    				//this loops puts word sections into an array
    				for($i = 1; $i < $count; $i++)
    					{
    						$k = 0;
    						$next_k = $k + 1; //next letter
    
    						$new_word_array[$i][$k] = $new_word_exp[$i]; //puts word section into array
    						
    						$next_letter = substr($new_word_exp[$i],1); //puts word section into array sans first letter
    						
    						//checks to see if more letters in word section. returns NULL if no letters
    						if($next_letter != FALSE)
    							{
    								$new_word_array[$i][$next_k] = $next_letter; 
    							}else
    							{
    								$new_word_array[$i][$next_k] = NULL; 					
    							}
    						
    					}
    				
    				echo "new_word_array before unset\r"; var_dump($new_word_array); echo "\r\r";
    				
    				//recursively gets all double letter to single letter misspelling combos. returns a array
    				$combos_all = get_combinations($new_word_array);
    
    				echo "combos_all\r"; var_dump($combos_all); echo "\r\r";
    				unset($new_word_array);
    				echo "new_word_array after unset\r"; var_dump($new_word_array); echo "\r\r";
    
    				//loops through array to append delimitor
    				foreach($combos_all as $value)
    					{
    						$kw_list .= $value. $delimitor;
    					}
    				
    				
    			}else
    			{
    				$kw_list = NULL;
    			}
    
    
    		return $kw_list;
    	}
    
    include($_SERVER['DOCUMENT_ROOT'].'get_combinations.php');
    
    
    $word = "bee
    honey bee
    mississippi state
    aadvark cartoon";
    
    //un commnet for one word
    //$word = 'mississippi';
    
    function line_split($input)
    	{
    		$word = preg_replace("/(\r|\n)/", "\r", $input);
    		$word = trim(preg_replace("/(\r){2,}/", "\r", $word));
    		return $word;
    
    	}
    
    $word = line_split($word);// makes all line breaks are the same i.e "\r"
    echo "word>>>>\r"; var_dump($word); echo "\r\r";
    
    
    $word_exp = explode("\r", $word);
    echo "word_exp>>>>\r"; var_dump($word_exp); echo "\r\r";
    
    foreach($word_exp as $value)
    	{
    		$delim = "\t". "test_project_name". "\t". $value. "\r";
    		
    		echo "value>>>>\r"; 
    var_dump($value); echo "\r\r";
    
    		$klist .= double_to_single_letter($value, $delim);
    		
    
    		
    	}
    
    
    echo $klist;
    
    ?>
    This is the contents of the get_combinations function:

    Code:
    <?php
    function get_combinations($source_array, $string = '', $i = 0)
        {
    	global $combinations;
    	
    	if ($i >= count($source_array))
    		{
    			
    			$combinations[] = $string;
    			
    			
            	}else
            	{
    			foreach ($source_array[$i] as $combos)
    				{
            				get_combinations($source_array, $string.$combos, $i + 1);
    					
    				}
    		}
    
    		
    	return  $combinations;
        }
    ?>
    My expectations is the following:

    Code:
    bee	test_project_name	bee
    be	test_project_name	bee
    honey bee	test_project_name	honey bee
    honey be	test_project_name	honey bee
    mississippi state	test_project_name	mississippi state
    mississipi state	test_project_name	mississippi state
    missisippi state	test_project_name	mississippi state
    missisipi state	test_project_name	mississippi state
    misissippi state	test_project_name	mississippi state
    misissipi state	test_project_name	mississippi state
    misisippi state	test_project_name	mississippi state
    misisipi state	test_project_name	mississippi state
    aadvark cartoon	test_project_name	aadvark cartoon
    aadvark carton	test_project_name	aadvark cartoon
    advark cartoon	test_project_name	aadvark cartoon
    advark carton	test_project_name	aadvark cartoon
    But instead the following is returned from the double_to_single_letter function.

    Code:
    bee	test_project_name	bee
    be	test_project_name	bee
    bee	test_project_name	honey bee
    be	test_project_name	honey bee
    honey bee	test_project_name	honey bee
    honey be	test_project_name	honey bee
    bee	test_project_name	mississippi state
    be	test_project_name	mississippi state
    honey bee	test_project_name	mississippi state
    honey be	test_project_name	mississippi state
    mississippi state	test_project_name	mississippi state
    mississipi state	test_project_name	mississippi state
    missisippi state	test_project_name	mississippi state
    missisipi state	test_project_name	mississippi state
    misissippi state	test_project_name	mississippi state
    misissipi state	test_project_name	mississippi state
    misisippi state	test_project_name	mississippi state
    misisipi state	test_project_name	mississippi state
    bee	test_project_name	aadvark cartoon
    be	test_project_name	aadvark cartoon
    honey bee	test_project_name	aadvark cartoon
    honey be	test_project_name	aadvark cartoon
    mississippi state	test_project_name	aadvark cartoon
    mississipi state	test_project_name	aadvark cartoon
    missisippi state	test_project_name	aadvark cartoon
    missisipi state	test_project_name	aadvark cartoon
    misissippi state	test_project_name	aadvark cartoon
    misissipi state	test_project_name	aadvark cartoon
    misisippi state	test_project_name	aadvark cartoon
    misisipi state	test_project_name	aadvark cartoon
    aadvark cartoon	test_project_name	aadvark cartoon
    aadvark carton	test_project_name	aadvark cartoon
    advark cartoon	test_project_name	aadvark cartoon
    advark carton	test_project_name	aadvark cartoon

    Values are repeated multiple times and the original keyword is appended to the wrong misspelling.

    The script was design so that one original word containing a double letter is looped through the double_to_single_letter function at a time.

    The line in double_to_single_letter function:
    Code:
    "new_word_array before unset\r"; var_dump($new_word_array); echo "\r\r";
    verifies that the array $new_word_array contains the word parts of 1 word. Therefore, only that should be sent to the get_combinations function to get misspelling permutations.

    Yet, as far as i can tell, the cumulative value of $new_word_array consisting of the current added parts of the array, along with its previous indexes get fed into the get_combinations function.

    What am I doing wrong?

  2. #2
    Hibernator YuriKolovsky's Avatar
    Join Date
    Nov 2007
    Location
    Malaga, Spain
    Posts
    1,072
    Mentioned
    4 Post(s)
    Tagged
    0 Thread(s)
    tip to get more answers
    im not going to look at all that code, takes too much time (i suggest you simplify your questions to get much more answers, remove all that you think can be removed from the text)

    my answer
    now i read that you have a function that works, and corrects a word, you can use that function, i would simply explode(' ',$text) the text at the spaces, and then do a foreach loop on each word as a single word, there might be better solutions, this is right from the top of my head, but should work as expected.

    like this
    Code:
    $text = 'all the darn text';
    $pieces = explode(' ', $text);
    foreach($pieces as $key => $word) {
      $correctword = double_to_single_letter($word);
    }
    or you can join it back together like this
    Code:
    $text = 'all the darn text';
    $pieces = explode(' ', $text);
    var $correcttext ;
    foreach($pieces as $key => $word) {
      $correcttext .= double_to_single_letter($word).' ';
    }


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •