SitePoint Sponsor

User Tag List

Page 2 of 3 FirstFirst 123 LastLast
Results 26 to 50 of 61
  1. #26
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,154
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Last but not least, I came up with one more check that may help in the e-mail test you have, but I don't promise it won't affect something else (although, it didn't affect any of the tests I wrote up).

    Edit:

    Please see post #43 for the most up-to-date version of this code.


    PHP Code:
    <?php
    class CSV
    {
        private 
    $filePath;
        private 
    $fileContents;
        
    //const ACCEPTABLE_DELIMITERS = '~[#,;:|\t]~'; // acceptable delimiters
        
    const EXCLUDED_CHARS '~[a-zA-Z0-9.\r\n\f ]~'// delimiters can't be characters, numbers or spaces

        
    public function __construct($file)
        {
            
    $this->filePath $file;
            
    $this->fileContents file($file);
        }

        public function 
    getDelimiter()
        {
            
    $delimitersByLine null;
            foreach (
    $this->fileContents as $lineNumber => $line)
            {
                
    $quoted false;
                
    $delimiters = array();

                for (
    $i 0$i strlen($line) - 1$i++)
                {
                    
    $char substr($line$i1);
                    if (
    $char === '"')
                    {
                        
    $quoted = !$quoted;
                    }
                    
    //else if (!$quoted && preg_match(self::ACCEPTABLE_DELIMITERS, $char))
                    
    else if (!$quoted && !preg_match(self::EXCLUDED_CHARS$char))
                    {
                        if (
    array_key_exists($char$delimiters))
                        {
                            
    $delimiters[$char]++;
                        }
                        else
                        {
                            
    $delimiters[$char] = 1;
                        }
                    }
                }

                if (
    $delimitersByLine === null)
                {
                    
    $delimitersByLine $delimiters;
                }
                else if (
    count($delimitersByLine) > && count($delimiters) > 0)
                {
                    
    $newDelimitersByLine $delimiters;
                    foreach (
    $delimitersByLine as $key => $value)
                    {
                        if ((
    array_key_exists($key$delimiters) && $delimiters[$key] === $value)
                            || !
    array_key_exists($key$delimiters))
                        {
                            
    $newDelimitersByLine[$key] = $value;
                        }
                    }
                    
    $delimitersByLine $newDelimitersByLine;

                    if (
    sizeof($delimitersByLine) < 2)
                        break;
                }
            }

            
    arsort($delimitersByLine);
            
    $firstDelimiter key($delimitersByLine);

            if (
    sizeof($delimitersByLine) > 1)
            {
                
    next($delimitersByLine);
                
    $nextDelimiter key($delimitersByLine);
                if (
    $delimitersByLine[$firstDelimiter] === $delimitersByLine[$nextDelimiter])
                {
                    
    // multiple delimiters with the same frequency found
                    // throw an error
                    
    throw new UnexpectedValueException();
                }

                return 
    $firstDelimiter;
            }
            else
                return 
    $firstDelimiter;
        }
    }
    This is the part I changed
    PHP Code:
                if ($delimitersByLine === null)
                {
                    
    $delimitersByLine $delimiters;
                }
                else if (
    count($delimitersByLine) > && count($delimiters) > 0
    My new test file (which now writes out the ord() value of the delimiter too)
    PHP Code:
    <?php
        
    include('csv.php');

        
    $files = array('comma.txt''colon.txt''pipe.txt''pound.txt''semicolon.txt''tab.txt''email.txt''mixture.txt');
        foreach (
    $files as $file)
        {
            
    $csv = new CSV('files/' $file);
            
    $delimiter $csv->getDelimiter();
            echo 
    'Delimiter for ' $file ' is ' $delimiter ' (' ord($delimiter) . ')<br />';
        }
    By setting $delimitersByLine to null at the beginning, and verifying it is still null (so it only gets set from the first line) resolves an issue with zero delimiters being found on the first line. So now when it finds zero delimiters on the first line, it will store an empty array in $delimitersByLine and not overwrite it with line 2 (if it contains delimiters).

    The else if then verifies that the line prior to the one being analyzed contained at least one delimiter and the current line contains at least one delimiter. If that is true, then it looks at the delimiters and combines them accordingly into a new array for tallying and continues to move forward.

    Now for the e-mail, it produces the following output (the parenthesizes shows the ord() value of the delimiter found).
    Code:
    Delimiter for comma.txt is , (44)
    Delimiter for colon.txt is : (58)
    Delimiter for pipe.txt is | (124)
    Delimiter for pound.txt is # (35)
    Delimiter for semicolon.txt is ; (59)
    Delimiter for tab.txt is (9)
    Delimiter for email.txt is (0)
    
    Fatal error: Uncaught exception 'UnexpectedValueException' in M:\SVN\sitepoint\trunk\Sitepoint\cancer10\csv.php:77 Stack trace: #0 M:\SVN\sitepoint\trunk\Sitepoint\cancer10\test.php(8): CSV->getDelimiter() #1 {main} thrown in M:\SVN\sitepoint\trunk\Sitepoint\cancer10\csv.php on line 77

  2. #27
    SitePoint Guru phantom007's Avatar
    Join Date
    May 2008
    Posts
    742
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    Hi cpradio

    I have been testing ur script with some radom csv files and came with one that generated an exception error and since I am unable to understand why it occurred I would like to seek ur help in finding out why.

    I have PMed u the csv file URL, pls check ur inbox.

    FYI I am using the following code of urs


    Code:
        function getDelimiter($file)
        {
            $delimitersByLine = null;
            $excluded = '~[a-zA-Z0-9.\r\n\f ]~';
    
            foreach ($file as $lineNumber => $line)
            {
                $quoted = false;
                $delimiters = array();
                
                
                for ($i = 0; $i < strlen($line) - 1; $i++)
                {
                    $char = substr($line, $i, 1);
                    if ($char === '"')
                    {
                        $quoted = !$quoted;
                    }
                    
                    else if (!$quoted && !preg_match($excluded, $char))
                    {
                        if (array_key_exists($char, $delimiters))
                        {
                            $delimiters[$char]++;
                        }
                        else
                        {
                            $delimiters[$char] = 1;
                        }
                    }
                }
    
                if ($delimitersByLine === null)
                {
                    $delimitersByLine = $delimiters;
                }
                else if (count($delimitersByLine) > 0 && count($delimiters) > 0)
                {
                    $newDelimitersByLine = $delimiters;
                    foreach ($delimitersByLine as $key => $value)
                    {
                        if ((array_key_exists($key, $delimiters) && $delimiters[$key] === $value)
                            || !array_key_exists($key, $delimiters))
                        {
                            $newDelimitersByLine[$key] = $value;
                        }
                    }
                    $delimitersByLine = $newDelimitersByLine;
    
                    if (sizeof($delimitersByLine) < 2)
                        break;
                }
            }
    
            arsort($delimitersByLine);
            $firstDelimiter = key($delimitersByLine);
    
            if (sizeof($delimitersByLine) > 1)
            {
                next($delimitersByLine);
                $nextDelimiter = key($delimitersByLine);
                if ($delimitersByLine[$firstDelimiter] === $delimitersByLine[$nextDelimiter])
                {
                    // multiple delimiters with the same frequency found
                    // throw an error
                    throw new UnexpectedValueException();
                }
    
                return $firstDelimiter;
            }
            else
                return $firstDelimiter;
        }
    
    $fileArray = file('test.csv');
    $delimiter = getDelimiter($fileArray);
    echo $delimiter;

    Error:
    Fatal error: Uncaught exception 'UnexpectedValueException' in /var/www/php2csv/index.php:140 Stack trace: #0 /var/www/php2csv/index.php(184): getDelimiter(Array) #1 {main} thrown in /var/www/php2csv/index.php on line 140

  3. #28
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,154
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Put var_dump($delimitersByLine); right before
    PHP Code:
                    throw new UnexpectedValueException(); 
    Then give me the output generated from that statement.

  4. #29
    SitePoint Guru phantom007's Avatar
    Join Date
    May 2008
    Posts
    742
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    Hi

    This is the output.

    array(25) { ["-"]=> int(3) [";"]=> int(3) [">"]=> int(2) ["<"]=> int(2) ["^"]=> int(1) [" "]=> int(1) ["Â"]=> int(1) ["©"]=> int(1) ["¢"]=> int(1) ["¯"]=> int(1) ["¡"]=> int(1) ["Å"]=> int(1) ["ƒ"]=> int(1) ["§"]=> int(1) ["/"]=> int(1) ["="]=> int(1) ["_"]=> int(1) [":"]=> int(1) ["@"]=> int(1) ["'"]=> int(1) ["%"]=> int(1) ["+"]=> int(1) [")"]=> int(1) ["("]=> int(1) ["Ã"]=> int(1) } -

  5. #30
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,154
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Okay, it sees two possible delimiters, one with - and one with ;, so you can either exclude "-" and it will work.

    PHP Code:
    $excluded '~[a-zA-Z0-9.\r\n\f\- ]~'

  6. #31
    SitePoint Guru phantom007's Avatar
    Join Date
    May 2008
    Posts
    742
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    Hi

    If I use
    Code:
    const ACCEPTABLE_DELIMITERS = '~[#,;:|]~';
    What do I have to add for it to accept TABs?

    Thanks

  7. #32
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,154
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    \t

    PHP Code:
    const ACCEPTABLE_DELIMITERS '~[#,;:|\t]~'

  8. #33
    SitePoint Guru phantom007's Avatar
    Join Date
    May 2008
    Posts
    742
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    Great...working except for the case where u have empty line feeds in the first few line..its displaying blank result in delimiter.


    Any solutions?


    please check attachment for example.


    Many thanks
    Attached Files Attached Files

  9. #34
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,154
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    I call that bad data, I guess you could just add a !empty($line) in the foreach

    Edit:

    Please see post #43 for the most up-to-date version of this code.


    PHP Code:
        function getDelimiter($file)
        {
            
    $delimitersByLine null;
            
    $excluded '~[a-zA-Z0-9.\r\n\f ]~';

            foreach (
    $file as $lineNumber => $line)
            {
                if (!empty(
    $line))
                {
                    
    $quoted false;
                    
    $delimiters = array();
                
                
                    for (
    $i 0$i strlen($line) - 1$i++)
                    {
                        
    $char substr($line$i1);
                        if (
    $char === '"')
                        {
                            
    $quoted = !$quoted;
                        }
                    
                        else if (!
    $quoted && !preg_match($excluded$char))
                        {
                            if (
    array_key_exists($char$delimiters))
                            {
                                
    $delimiters[$char]++;
                            }
                            else
                            {
                                
    $delimiters[$char] = 1;
                            }
                        }
                    }

                    if (
    $delimitersByLine === null)
                    {
                        
    $delimitersByLine $delimiters;
                    }
                    else if (
    count($delimitersByLine) > && count($delimiters) > 0)
                    {
                        
    $newDelimitersByLine $delimiters;
                        foreach (
    $delimitersByLine as $key => $value)
                        {
                            if ((
    array_key_exists($key$delimiters) && $delimiters[$key] === $value)
                                || !
    array_key_exists($key$delimiters))
                            {
                                
    $newDelimitersByLine[$key] = $value;
                            }
                        }
                        
    $delimitersByLine $newDelimitersByLine;

                        if (
    sizeof($delimitersByLine) < 2)
                            break;
                    }
                }
            }

            
    arsort($delimitersByLine);
            
    $firstDelimiter key($delimitersByLine);

            if (
    sizeof($delimitersByLine) > 1)
            {
                
    next($delimitersByLine);
                
    $nextDelimiter key($delimitersByLine);
                if (
    $delimitersByLine[$firstDelimiter] === $delimitersByLine[$nextDelimiter])
                {
                    
    // multiple delimiters with the same frequency found
                    // throw an error
                    
    throw new UnexpectedValueException();
                }

                return 
    $firstDelimiter;
            }
            else
                return 
    $firstDelimiter;
        }

    $fileArray file('test.csv');
    $delimiter getDelimiter($fileArray);
    echo 
    $delimiter

  10. #35
    SitePoint Guru phantom007's Avatar
    Join Date
    May 2008
    Posts
    742
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    yup worked like a charm.

    Another set of data that is not returning a delimiter:

    C:\Users\Fabien\Desktop\Combactive\ACTIONS (Réunions et Courriers d'information pour adhérents)\EMAGNY\2012-2013\Médias + Communication\Francophone.txt 21/05/2013 22:35:14
    Progitek [3 e-mails]
    mat.tierschutz@bluewin.co;
    cat.chat.enfant@gmail.com;
    bay@aspas-nature.in;
    dd.wahf@gmail.be;
    aa@gmail.com;
    bb3a@gmail.com;
    cc4a@gmail.fr;
    The delimiter in this case should be ; without any second thought

    Thanks

  11. #36
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,154
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Well, that is probably because I accidentally switched it back to using $excluded characters instead of a range of acceptable delimiters. If you put back in the acceptable delimiters you want to track, then it may work. It is catching the @ and the ; as being possible delimiters with the latest update I posted.

  12. #37
    SitePoint Guru phantom007's Avatar
    Join Date
    May 2008
    Posts
    742
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    I am already using the acceptable chars instead of excluded chars, pls chk code below.


    PHP Code:
    <?php
     
    function getDelimiter($file)
        {
            
    $delimitersByLine null;
            
    $delimiterArray = array();
            
    // Exclude the following decimal chars.
            
    $excluded = array(10,11,12,13); 
            
    $included '~[#,;|\t]~';

            foreach (
    $file as $lineNumber => $line)
            {

                if( 
    in_array(ord($line), $excluded)) continue;

                

                
    $quoted false;
                
    $delimiters = array();

                for (
    $i 0$i strlen($line) - 1$i++)
                {
                    
    $char substr($line$i1);
                    
                    if (
    $char === '"')
                    {
                        
    $quoted = !$quoted;
                    }


                    
                    else if (!
    $quoted && preg_match($included$char))
                    {
                        if (
    array_key_exists($char$delimiters))
                        {
                            
    $delimiters[$char]++;
                        }
                        else
                        {
                            
    $delimiters[$char] = 1;
                        }
                    }
                }

                if (
    $delimitersByLine === null)
                {
                    
    $delimitersByLine $delimiters;
                }
                else if (
    count($delimitersByLine) > && count($delimiters) > 0)
                {
                    
    $newDelimitersByLine $delimiters;
                    foreach (
    $delimitersByLine as $key => $value)
                    {
                        if ((
    array_key_exists($key$delimiters) && $delimiters[$key] === $value)
                            || !
    array_key_exists($key$delimiters))
                        {
                            
    $newDelimitersByLine[$key] = $value;
                        }
                    }
                    
    $delimitersByLine $newDelimitersByLine;

                    if (
    sizeof($delimitersByLine) < 2)
                        break;
                }
            }

            
    arsort($delimitersByLine);
            
    $firstDelimiter key($delimitersByLine);

            if (
    sizeof($delimitersByLine) > 1)
            {
                
    next($delimitersByLine);
                
    $nextDelimiter key($delimitersByLine);
                if (
    $delimitersByLine[$firstDelimiter] === $delimitersByLine[$nextDelimiter])
                {
                    
    // multiple delimiters with the same frequency found
                    // throw an error
                    
    var_dump($delimitersByLine);
                    
    //throw new UnexpectedValueException();
                
    }

                
    $delimiter $firstDelimiter;
            }
            else
                
    $delimiter $firstDelimiter;

                
    //ed('['.ord($delimiter).']');
            
    $delimiterArray = array(
                    
    'DELIMITER' => $delimiter,
                    
    'DELIMITER_DESC' => $delimiter
                
    );

            
    // Check delimiters
            
    if( ! $delimiter $delimiterArray['DELIMITER_DESC'] = 'NO_DELIMITER_FOUND';
            if( 
    ord($delimiter)==$delimiterArray['DELIMITER_DESC'] = 'TAB';
            


            return 
    $delimiterArray;
        }

    ?>

  13. #38
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,154
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    No, idea what you may have done to your code that breaks it, but when I put it back into the original code, it correctly identifies ; as the delimiter

    Edit:

    Please see post #43 for the most up-to-date version of this code.


    csv.php
    PHP Code:
    <?php
    class CSV
    {
        private 
    $filePath;
        private 
    $fileContents;
        const 
    ACCEPTABLE_DELIMITERS '~[#,;|\t]~'// acceptable delimiters
        //const EXCLUDED_CHARS = '~[a-zA-Z0-9.\r\n\f ]~'; // delimiters can't be characters, numbers or spaces

        
    public function __construct($file)
        {
            
    $this->filePath $file;
            
    $this->fileContents file($file);
        }

        public function 
    getDelimiter()
        {
            
    $delimitersByLine null;
            foreach (
    $this->fileContents as $lineNumber => $line)
            {
                if (!empty(
    $line))
                {
                    
    $quoted false;
                    
    $delimiters = array();

                    for (
    $i 0$i strlen($line) - 1$i++)
                    {
                        
    $char substr($line$i1);
                        if (
    $char === '"')
                        {
                            
    $quoted = !$quoted;
                        }
                        else if (!
    $quoted && preg_match(self::ACCEPTABLE_DELIMITERS$char))
                        
    //else if (!$quoted && !preg_match(self::EXCLUDED_CHARS, $char))
                        
    {
                            if (
    array_key_exists($char$delimiters))
                            {
                                
    $delimiters[$char]++;
                            }
                            else
                            {
                                
    $delimiters[$char] = 1;
                            }
                        }
                    }

                    if (
    $delimitersByLine === null)
                    {
                        
    $delimitersByLine $delimiters;
                    }
                    else if (
    count($delimitersByLine) > && count($delimiters) > 0)
                    {
                        
    $newDelimitersByLine $delimiters;
                        foreach (
    $delimitersByLine as $key => $value)
                        {
                            if ((
    array_key_exists($key$delimiters) && $delimiters[$key] === $value)
                                || !
    array_key_exists($key$delimiters))
                            {
                                
    $newDelimitersByLine[$key] = $value;
                            }
                        }
                        
    $delimitersByLine $newDelimitersByLine;

                        if (
    sizeof($delimitersByLine) < 2)
                            break;
                    }
                }
            }

            
    arsort($delimitersByLine);
            
    $firstDelimiter key($delimitersByLine);

            if (
    sizeof($delimitersByLine) > 1)
            {
                
    next($delimitersByLine);
                
    $nextDelimiter key($delimitersByLine);
                if (
    $delimitersByLine[$firstDelimiter] === $delimitersByLine[$nextDelimiter])
                {
                    
    // multiple delimiters with the same frequency found
                    // throw an error
                    
    throw new UnexpectedValueException();
                }

                return 
    $firstDelimiter;
            }
            else
                return 
    $firstDelimiter;
        }
    }
    test.php
    PHP Code:
    <?php
        
    include('csv.php');

        
    $files = array('data.txt''comma.txt''colon.txt''pipe.txt''pound.txt''semicolon.txt''tab.txt''email.txt''mixture.txt');
        foreach (
    $files as $file)
        {
            
    $csv = new CSV('files/' $file);
            
    $delimiter $csv->getDelimiter();
            echo 
    'Delimiter for ' $file ' is ' $delimiter ' (' ord($delimiter) . ')<br />';
        }
    data.txt
    Code:
    mat.tierschutz@bluewin.co;
    cat.chat.enfant@gmail.com;
    bay@aspas-nature.in;
    dd.wahf@gmail.be;
    aa@gmail.com;
    bb3a@gmail.com;
    cc4a@gmail.fr;
    output
    Code:
    Delimiter for data.txt is ; (59)
    Delimiter for comma.txt is , (44)
    Delimiter for colon.txt is (0)
    Delimiter for pipe.txt is | (124)
    Delimiter for pound.txt is # (35)
    Delimiter for semicolon.txt is ; (59)
    Delimiter for tab.txt is (9)
    Delimiter for email.txt is (0)
    
    Fatal error: Uncaught exception 'UnexpectedValueException' in M:\SVN\sitepoint\trunk\Sitepoint\cancer10\csv.php:80 Stack trace: #0 M:\SVN\sitepoint\trunk\Sitepoint\cancer10\test.php(8): CSV->getDelimiter() #1 {main} thrown in M:\SVN\sitepoint\trunk\Sitepoint\cancer10\csv.php on line 80

  14. #39
    SitePoint Guru phantom007's Avatar
    Join Date
    May 2008
    Posts
    742
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    Yes you are right, that code works.

    I was thinking a way to avoid an exception. Say when an exception occurs can we check the count of which delimiter is more? For example, consider the following example. In this case an exception will occur for , and ;

    So as per my above logic, we consider the ; as the delimiter since the count of ; is more than ,

    And if the count of , is equal to ; then we consider which ever it finds first. Do u think this is a neat idea?



    abc,111
    def; 111
    ijk; 222


    Thanks

  15. #40
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,154
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by cancer10 View Post
    Yes you are right, that code works.

    I was thinking a way to avoid an exception. Say when an exception occurs can we check the count of which delimiter is more? For example, consider the following example. In this case an exception will occur for , and ;

    So as per my above logic, we consider the ; as the delimiter since the count of ; is more than ,

    And if the count of , is equal to ; then we consider which ever it finds first. Do u think this is a neat idea?
    That doesn't really help you parse the data. fgetcsv, only accepts 1 delimiter. You will end up with either an error or bad data. Neither of which is helpful to your application.

  16. #41
    SitePoint Guru phantom007's Avatar
    Join Date
    May 2008
    Posts
    742
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    Not sure if you got my point, we will return only 1 delimiter based on the following cases:

    Case 1:
    if count of ; greater than , we return ; as delimiter

    Case 2:
    If count of ; equals , we return , since it was the one found first.


    Does this make sense to you?


    Thanks

  17. #42
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,154
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Yes, I get your point, but you have a file that has 2 delimiters. If you only take one, and use it, you will still end up with an error parsing that CSV or bad data. Because you can't account for the second delimiter that exists.

    So in your example, the first set of values you might receive using ; because it happens more frequently is
    Code:
    abc,111\ndef; 111
    The second set of data would be
    Code:
    ijk; 222
    All that is based on the assumption PHP can handle it that way and doesn't return FALSE designating an error due to the comma delimiter.

  18. #43
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,154
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Just in case you still insist on wanting to not throw an error when multiple delimiters is found, here is a working version:

    csv.php
    PHP Code:
    <?php
    class CSV
    {
        private 
    $filePath;
        private 
    $fileContents;

        const 
    ACCEPTABLE_DELIMITERS '~[#,;:|\t]~'// acceptable delimiters
        //const EXCLUDED_CHARS = '~[a-zA-Z0-9.\r\n\f ]~'; // delimiters can't be characters, numbers or spaces

        // Constructor accepting a file path
        
    public function __construct($file)
        {
            
    $this->filePath $file;
            
    // Read the file contents and store it into a private variable
            
    $this->fileContents file($file);
        }

        public function 
    getDelimiter()
        {
            
    $delimitersData null;
            
    // Loop through each line in the file, identify the index as the line number and the content of the line as $line
            
    foreach ($this->fileContents as $lineNumber => $line)
            {
                
    // Don't parse an empty line, it could lead to weird results
                
    if (!empty($line))
                {
                    
    $quoted false;
                    
    $delimitersForGivenLine = array();

                    
    // Loop through each character in the line
                    
    for ($i 0$i strlen($line) - 1$i++)
                    {
                        
    // Read the character we are currently evaluating
                        
    $char substr($line$i1);
                        
    // If the character is a ", set $quoted to its opposite value 
                        // (it starts out as false, so using !$quoted sets it to true, when it encounters another ", it will set it back to false and so on)
                        
    if ($char === '"')
                        {
                            
    $quoted = !$quoted;
                        }
                        
    // Check if the character we are evaluation is an Acceptable Delimiter (or is not an Excluded Character)
                        
    else if (!$quoted && preg_match(self::ACCEPTABLE_DELIMITERS$char))
                        
    //else if (!$quoted && !preg_match(self::EXCLUDED_CHARS, $char))
                        
    {
                            
    // Check if the character/delimiter was already found on this line and update its' properties accordingly
                            
    if (array_key_exists($char$delimitersForGivenLine))
                            {
                                
    // Update the count for this delimiter since we just found another occurrence
                                
    $delimitersForGivenLine[$char]['count']++;
                                
    // Add the content of the line to this delimiter, so we know which delimiter to use on it later (this actually is useless -- I think)
                                
    $delimitersForGivenLine[$char]['lines'][$lineNumber] = $line;
                            }
                            else
                            {
                                
    // This character/delimiter has not been found previously on this line, so create it
                                
    $delimitersForGivenLine[$char]['count'] = 1;
                                
    // Assign this delimiter the current line, so we know how to read that line later on
                                
    $delimitersForGivenLine[$char]['lines'][$lineNumber] = $line;
                            }
                        }
                    }

                    
    // On the first line of the file, this variable will be null, now we need to set it. It will be used for comparing the delimiters of the previous line to the current line
                    
    if ($delimitersData === null || empty($delimitersData))  
                    {
                        
    $delimitersData $delimitersForGivenLine;
                    }
                    
    // Verify both the previous line's data and the current line's data have delimiters (otherwise the comparison isn't useful)
                    
    else if (count($delimitersData) > && count($delimitersForGivenLine) > 0)
                    {
                        
    // Store the current line's data into a new variable
                        
    $newDelimitersByLine $delimitersForGivenLine;
                        
    // Loop through the previous lines delimiters (key is the delimiter character, and value is an array consisting of count and lines)
                        
    foreach ($delimitersData as $key => $value)
                        {
                            
    // Verify the previous line's delimiter(s) exist in the current line's evaluation and if they do, verify the counts are the same
                            // OR check that the previous line's delimiter(s) do not exist in the current line's evaluation
                            // The point here is to see if we need to merge arrays
                            // So why not use array_merge()? Good question, because it overwrites the keys of your arrays, and the keys are important to our system
                            
    if ((array_key_exists($key$delimitersForGivenLine) && $delimitersForGivenLine[$key]['count'] === $value['count'])
                                || !
    array_key_exists($key$delimitersForGivenLine))
                            {
                                
    // This line is for when !array_key_exists($key, $delimitersForGivenLine) evaluates true, it writes the count into the 
                                // new variable for the given delimiter (key)
                                
    $newDelimitersByLine[$key]['count'] = $value['count'];

                                
    // If the delimiter existed in the previous line, loop through the line numbers, keeping their index and values and 
                                // copy them into the new variable.
                                
    if (array_key_exists($key$delimitersForGivenLine))
                                {
                                    foreach (
    $value['lines'] as $lineNumber => $line)
                                        
    $newDelimitersByLine[$key]['lines'][$lineNumber] = $line;
                                }
                                else
                                {
                                    
    // Since the delimiter didn't exist in the prior line, just write the lines directly over (we don't need to worry about keeping existing data)
                                    
    $newDelimitersByLine[$key]['lines'] = $value['lines'];
                                }
                            }
                        }
                        
    // Store the merged array so it can be used again for the next line (so it keeps a running count)
                        
    $delimitersData $newDelimitersByLine;
                    }
                }
            }

            
    // Sort the array of delimiter data using a custom sort routine and maintaining the key indexes
            // This is to put the most frequent delimiter and its data at the top of the array
            
    uasort($delimitersData"CSV::sortDelimiters");

            
    //Remove delimiters that don't have the exact count as the primary delimiter
            
    $initialCount null;
            
    $finalDelimiterData = array();

            
    // Loop through each delimiter found in the file ($key is the delimiter character, and $data is the count/lines info)
            
    foreach ($delimitersData as $key => $data)
            {
                
    // Since the array is already sorted, we want to read the first delimiter and store it
                // All other delimiters will ONLY be stored if their count matches the first delimiter 
                // (so you can't have a delimiter of ";" that indicates it has 8 counts per line and have a delimiter of ","
                //that indicates it has 2 counts per line; the "," simply can't be an accurate delimiter in this case)
                
    if ($initialCount === null)
                {
                    
    $initialCount $data['count'];
                    
    $finalDelimiterData[$key] = $data;
                }
                else
                {
                    
    // Only store the delimiter if the count matches the most frequent found delimiter
                    
    if ($initialCount === $data['count'])
                        
    $finalDelimiterData[$key] = $data;
                }
            }

            
    // Return the delimiter information back, so it could be looped through and parsed using str_getcsv
            
    return $finalDelimiterData;
        }

            
    // Custom Sort for the Delimiters
            
    public static function sortDelimiters($a$b)
            {
                
    // If the delimiter data for item $a in the array, matches item $b, return 0
                
    if ($a['count'] === $b['count'] && sizeof($a['lines']) === sizeof($b['lines']))
                {
                    return 
    0;
                }

                
    // if $a has more lines associated to it than $b, return -1 so it leaves $a higher than $b,
                // otherwise, when $b needs to move up ahead of $a
                
    return sizeof($a['lines']) > sizeof($b['lines']) ? -1;
          } 
    }
    test.php
    PHP Code:
    <?php
        
    include('csv.php');

        
    //$files = array('data.txt', 'comma.txt', 'colon.txt', 'pipe.txt', 'pound.txt', 'semicolon.txt', 'tab.txt', 'email.txt', 'mixture.txt');
        
    $files = array('data.txt''mixture.txt');
        foreach (
    $files as $file)
        {
            
    $csv = new CSV('files/' $file);
            
    $delimiterData $csv->getDelimiter();
            
    $delimiter key($delimiterData);
            echo 
    'Delimiter for ' $file ' is ' $delimiter ' (' ord($delimiter) . ')<br />';
            echo 
    '<pre>';
            echo 
    var_dump($delimiterData);
            echo 
    '</pre><br />';
        }
    data.txt
    Code:
    abc,111
    def; 111
    ijk; 222
    output
    Code:
    Delimiter for data.txt is ; (59)
    array(2) {
      [";"]=>
      array(2) {
        ["count"]=>
        int(1)
        ["lines"]=>
        array(2) {
          [2]=>
          string(8) "ijk; 222"
          [1]=>
          string(10) "def; 111
    "
        }
      }
      [","]=>
      array(2) {
        ["count"]=>
        int(1)
        ["lines"]=>
        array(1) {
          [0]=>
          string(9) "abc,111
    "
        }
      }
    }
    mixture.txt
    Code:
    this|is|"a test"|to|123|see|how|it|works
    this; is; "a test"; to; 123; see; how; it; works
    123.|can?|you&|see|what|I'm|doing?|eight*|nine
    output
    Code:
    Delimiter for mixture.txt is | (124)
    array(2) {
      ["|"]=>
      array(2) {
        ["count"]=>
        int(8)
        ["lines"]=>
        array(2) {
          [2]=>
          string(46) "123.|can?|you&|see|what|I'm|doing?|eight*|nine"
          [0]=>
          string(42) "this|is|"a test"|to|123|see|how|it|works
    "
        }
      }
      [";"]=>
      array(2) {
        ["count"]=>
        int(8)
        ["lines"]=>
        array(1) {
          [1]=>
          string(50) "this; is; "a test"; to; 123; see; how; it; works
    "
        }
      }
    }
    Last edited by cpradio; Jun 27, 2013 at 02:57. Reason: Updated to return all Delimiter data, instead of first delimiter

  19. #44
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,154
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    I've updated my prior post to return ALL delimiter data (as it may be helpful to your project). In short, it allows you to know which lines are associated to each delimiter, so you could use str_getcsv to parse line by line by its determined delimiter.

  20. #45
    SitePoint Guru phantom007's Avatar
    Join Date
    May 2008
    Posts
    742
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    Thanks for all those.

    Can you also please comment your codes so that it will be easy for me to understand and to make any future modifications?


    Many Thanks

  21. #46
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,154
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Okay, I added a bunch of comments to my prior php code in Post #43.

    I've also been playing with making it more OOP (if that is of any interest to you). I'm 95% there, but I really want to make an additional change to support associative keys in it that I haven't quite figured out. If that is of any interest, I'll post it as a zip file, as it contains several more files (same logic, just split up by responsibility).

  22. #47
    SitePoint Guru phantom007's Avatar
    Join Date
    May 2008
    Posts
    742
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    Many thanks for your comments.

    I was testing your code in post #43 with teh following data, but its returning blank result. Can you please tell me wots wrong?



    Code:
    C:\Users\Fabien\Desktop\Combactive\ACTIONS (Réunions et Courriers d'information pour adhérents)\EMAGNY\2012-2013\Médias + Communication\Francophone.txt 21/05/2013 22:35:14
    Progitek [244 e-mails]
    aaa.aaa@gmail.ch;
    aaa.chat.enfant@gmail.fr;
    aaa@aspas-gmail.org;
    aaa@gmail.com;
    aaa@asms-swiss.ch;
    aaa@gmail.org;
    aaa@gmail.be;
    aaa.ellidge@gmail.fr;
    aaa@gmail.fr;
    aaa@gmail.be;
    aaa.wahf@gmail.be;


    THanks

  23. #48
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,154
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    I copied and pasted the code straight from Post 43 and received the following output for your data
    Code:
    Delimiter for data.txt is ; (59)
    array(1) {
      [";"]=>
      array(2) {
        ["count"]=>
        int(1)
        ["lines"]=>
        array(10) {
          [9]=>
          string(15) "aaa@gmail.be;
    "
          [8]=>
          string(15) "aaa@gmail.fr;
    "
          [7]=>
          string(23) "aaa.ellidge@gmail.fr;
    "
          [6]=>
          string(15) "aaa@gmail.be;
    "
          [5]=>
          string(16) "aaa@gmail.org;
    "
          [4]=>
          string(20) "aaa@asms-swiss.ch;
    "
          [3]=>
          string(16) "aaa@gmail.com;
    "
          [2]=>
          string(22) "aaa@aspas-gmail.org;
    "
          [1]=>
          string(27) "aaa.chat.enfant@gmail.fr;
    "
          [0]=>
          string(19) "aaa.aaa@gmail.ch;
    "
        }
      }
    }

  24. #49
    SitePoint Guru phantom007's Avatar
    Join Date
    May 2008
    Posts
    742
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    WHen I am using the same code separately in another file, im getting following output. PFA zip file



    Code:
    Delimiter for fake-data-initially.txt is : (58)
    
    array(1) {
      [":"]=>
      array(2) {
        ["count"]=>
        int(3)
        ["lines"]=>
        array(1) {
          [0]=>
          string(175) "C:\Users\Fabien\Desktop\Combactive\ACTIONS (Réunions et Courriers d'information pour adhérents)\EMAGNY\2012-2013\Médias + Communication\Francophone.txt 21/05/2013 22:35:14
    "
        }
      }
    }
    https://www.dropbox.com/s/3y6vwpaodw...-initially.zip

  25. #50
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,154
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Your fake-data-initially.txt file is a bit funky...


    the first line contains a file path, followed by what looks to be a metadata line.... neither of those would be beneficial to trying to identify a delimiter, the only pieces that are beneficial are lines 3-13. Not sure how you would tell the system to ignore those two lines....


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •