SitePoint Sponsor

User Tag List

Results 1 to 25 of 61

Threaded View

  1. #15
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,054
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Last but not least, I came up with one more check that may help in the e-mail test you have, but I don't promise it won't affect something else (although, it didn't affect any of the tests I wrote up).

    Edit:

    Please see post #43 for the most up-to-date version of this code.


    PHP Code:
    <?php
    class CSV
    {
        private 
    $filePath;
        private 
    $fileContents;
        
    //const ACCEPTABLE_DELIMITERS = '~[#,;:|\t]~'; // acceptable delimiters
        
    const EXCLUDED_CHARS '~[a-zA-Z0-9.\r\n\f ]~'// delimiters can't be characters, numbers or spaces

        
    public function __construct($file)
        {
            
    $this->filePath $file;
            
    $this->fileContents file($file);
        }

        public function 
    getDelimiter()
        {
            
    $delimitersByLine null;
            foreach (
    $this->fileContents as $lineNumber => $line)
            {
                
    $quoted false;
                
    $delimiters = array();

                for (
    $i 0$i strlen($line) - 1$i++)
                {
                    
    $char substr($line$i1);
                    if (
    $char === '"')
                    {
                        
    $quoted = !$quoted;
                    }
                    
    //else if (!$quoted && preg_match(self::ACCEPTABLE_DELIMITERS, $char))
                    
    else if (!$quoted && !preg_match(self::EXCLUDED_CHARS$char))
                    {
                        if (
    array_key_exists($char$delimiters))
                        {
                            
    $delimiters[$char]++;
                        }
                        else
                        {
                            
    $delimiters[$char] = 1;
                        }
                    }
                }

                if (
    $delimitersByLine === null)
                {
                    
    $delimitersByLine $delimiters;
                }
                else if (
    count($delimitersByLine) > && count($delimiters) > 0)
                {
                    
    $newDelimitersByLine $delimiters;
                    foreach (
    $delimitersByLine as $key => $value)
                    {
                        if ((
    array_key_exists($key$delimiters) && $delimiters[$key] === $value)
                            || !
    array_key_exists($key$delimiters))
                        {
                            
    $newDelimitersByLine[$key] = $value;
                        }
                    }
                    
    $delimitersByLine $newDelimitersByLine;

                    if (
    sizeof($delimitersByLine) < 2)
                        break;
                }
            }

            
    arsort($delimitersByLine);
            
    $firstDelimiter key($delimitersByLine);

            if (
    sizeof($delimitersByLine) > 1)
            {
                
    next($delimitersByLine);
                
    $nextDelimiter key($delimitersByLine);
                if (
    $delimitersByLine[$firstDelimiter] === $delimitersByLine[$nextDelimiter])
                {
                    
    // multiple delimiters with the same frequency found
                    // throw an error
                    
    throw new UnexpectedValueException();
                }

                return 
    $firstDelimiter;
            }
            else
                return 
    $firstDelimiter;
        }
    }
    This is the part I changed
    PHP Code:
                if ($delimitersByLine === null)
                {
                    
    $delimitersByLine $delimiters;
                }
                else if (
    count($delimitersByLine) > && count($delimiters) > 0
    My new test file (which now writes out the ord() value of the delimiter too)
    PHP Code:
    <?php
        
    include('csv.php');

        
    $files = array('comma.txt''colon.txt''pipe.txt''pound.txt''semicolon.txt''tab.txt''email.txt''mixture.txt');
        foreach (
    $files as $file)
        {
            
    $csv = new CSV('files/' $file);
            
    $delimiter $csv->getDelimiter();
            echo 
    'Delimiter for ' $file ' is ' $delimiter ' (' ord($delimiter) . ')<br />';
        }
    By setting $delimitersByLine to null at the beginning, and verifying it is still null (so it only gets set from the first line) resolves an issue with zero delimiters being found on the first line. So now when it finds zero delimiters on the first line, it will store an empty array in $delimitersByLine and not overwrite it with line 2 (if it contains delimiters).

    The else if then verifies that the line prior to the one being analyzed contained at least one delimiter and the current line contains at least one delimiter. If that is true, then it looks at the delimiters and combines them accordingly into a new array for tallying and continues to move forward.

    Now for the e-mail, it produces the following output (the parenthesizes shows the ord() value of the delimiter found).
    Code:
    Delimiter for comma.txt is , (44)
    Delimiter for colon.txt is : (58)
    Delimiter for pipe.txt is | (124)
    Delimiter for pound.txt is # (35)
    Delimiter for semicolon.txt is ; (59)
    Delimiter for tab.txt is (9)
    Delimiter for email.txt is (0)
    
    Fatal error: Uncaught exception 'UnexpectedValueException' in M:\SVN\sitepoint\trunk\Sitepoint\cancer10\csv.php:77 Stack trace: #0 M:\SVN\sitepoint\trunk\Sitepoint\cancer10\test.php(8): CSV->getDelimiter() #1 {main} thrown in M:\SVN\sitepoint\trunk\Sitepoint\cancer10\csv.php on line 77
    Last edited by cpradio; Jun 27, 2013 at 02:59. Reason: Added edit/warning
    Be sure to congratulate Patche on earning July's Member of the Month
    Go ahead and blame me, I still won't lose any sleep over it
    My Blog | My Technical Notes


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •