SitePoint Sponsor

User Tag List

Results 1 to 13 of 13
  1. #1
    SitePoint Zealot
    Join Date
    Sep 2005
    Posts
    126
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    How can I do this with fgetcsv ?

    Hello,
    I need to read csv file and analyze it to see if it contains error, and if errors occur then display error(s), and if not, then insert the values to the db.

    The sample csv looks like

    PHP Code:
    id,first_name,last_name,email
    "12345","Jane","test@test.com" // incorrect row
    "99999","John","Smith","test@test.com" // correct row
    "232323",Sarah,"Smith","test@test.com"  // incorrect row 
    I just put the comment beside each row for you guys understanding.
    As you see, each row should contain 4 columns, each separated with comma(,), enclosed with ""
    First row is not correct because it only has 3 columns, and in this case, it should display error like this

    PHP Code:
    Line 2   "12345","Jane","test@test.com"     Only 3 fields were found 

    and last row is not correct because Sarah is not enclosed with "" , and in this case, it should display error like this
    PHP Code:
    Line 4   "232323",Sarah,"Smith","test@test.com"    Firstname is not enclosed

    What I tried to do is

    PHP Code:
    $handle fopen($importFile"r");
    while ((
    $line fgetcsv($handle1000",",'"')) !== FALSE
    {
        
    $num count($line);
        if (
    $num != 4
        { 
           do 
    something;
        }
    :
    :

    Q1: How can I display the error causing line as it is (like
    PHP Code:
    "12345","Jane","test@test.com" 
    ) from csv file in the middle as I read it with fgetcsv?
    I tried

    PHP Code:
       for ($c=0$c $num$c++) 
        {
            echo 
    $line[$c] . " ";
        } 
    in the do something; place, but it prints
    PHP Code:
    12345 Jane test@test.com 
    - no comma, no ""

    Q2: my fgetcsv - fgetcsv($handle, 1000, ",",'"') - does not catch the error for line 4, even though Sarah is not enclosed with "" , it considers the row not error and proceed as normal.
    Why is it? I need to force the each value in csv to have enclosure ""

    Thanks for your help in advance.

  2. #2
    Programming Team silver trophybronze trophy
    Mittineague's Avatar
    Join Date
    Jul 2005
    Location
    West Springfield, Massachusetts
    Posts
    17,036
    Mentioned
    187 Post(s)
    Tagged
    2 Thread(s)
    I suppose you could use something other than fgetcsv()

    But I'm wondering if it might work if you passed the fourth argument as NULL? eg.
    PHP Code:
    while (($line fgetcsv($handle1000",",NULL)) !== FALSE
    It's late for me so I don't have time to try it right now, but if it works, it should over-ride the default value for the enclosure, and if you're lucky it will pass everythiing between the , delimiter to the returned array.

  3. #3
    SitePoint Zealot
    Join Date
    Sep 2005
    Posts
    126
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Umm
    PHP Code:
    while (($line fgetcsv($handle1000",",NULL)) !== FALSE
    does not work, it says

    PHP Code:
    fgetcsv(): enclosure must be a character 

  4. #4
    SitePoint Zealot
    Join Date
    Sep 2005
    Posts
    126
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I tried to use fgets() and str_getcsv() so I can display the row as it is with fgets() and still do csv with str_getcsv() like

    PHP Code:
    $handle fopen($importFile"r");

    while(! 
    feof($handle))
    {
        
    $row fgets($handle);

        
    print_r(str_getcsv($row,",",'"'));    

    but str_getcsv just dies,no output. What did I do wrong?

    I'll appreciate any help

  5. #5
    Programming Team silver trophybronze trophy
    Mittineague's Avatar
    Join Date
    Jul 2005
    Location
    West Springfield, Massachusetts
    Posts
    17,036
    Mentioned
    187 Post(s)
    Tagged
    2 Thread(s)
    I'm busy right now, but as soon as I get a chance I'll do some testing to see if I can come up with something.

  6. #6
    messing with my mind fristi's Avatar
    Join Date
    Feb 2009
    Posts
    292
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi,

    Your orignal problem with not detecting the missing " is because a csv doesn't need them to be correct. They are there to make sure that you can use the delimiter (,) in the values itself ect...

    If you want to have this functionality you need to write your own function.
    Something like this:

    PHP Code:
    function csvParser($handle$delimiter ','$enclosure '"'$escape '\\')
    {
        
    $headers str_getcsv(fgets($handle), $delimiter$enclosure$escape);
        
    $need    count($headers);
        
    $array   = array();
        
        while(!
    feof($handle)) {
        
            
    $row fgets($handle);
            if(
    preg_match("/(?(?<!$enclosure),|,(?!$enclosure))/"$row)) 
                die(
    "enclosing error: $row");
      
            
    $row str_getcsv($row$delimiter$enclosure$escape);
            if(
    count($row) != $need)
                die(
    "count error: count($row) != $need");
            
            foreach (
    $headers as $pos => $key) {
                
    $named[$key] = $row[$pos];
            }
            
            
    $array[] = $named;
        }
        
        return 
    $array;
    }

    $importFile 'test.csv';
    $handle fopen($importFile"r");

    print_r(csvParser($handle)); 
    this will die on the failures you want.
    you can change the function to returning false or what ever. This is just an example that will give you an array filled with assoc arrays for easy access.

    Just something to show you the right way.

    it is only tested on PHP 5.3
    To PHP or to Perl, that is the question!
    (Bucket - simpletest) User

  7. #7
    SitePoint Zealot
    Join Date
    Sep 2005
    Posts
    126
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks, yes fgetcsv is only available on php 5.3 is the problem. Our server is 5.2 and there is no plan for upgrading to 5.3 Probably I need to write something my own.. sigh

  8. #8
    Programming Team silver trophybronze trophy
    Mittineague's Avatar
    Join Date
    Jul 2005
    Location
    West Springfield, Massachusetts
    Posts
    17,036
    Mentioned
    187 Post(s)
    Tagged
    2 Thread(s)
    I do think something like fristi posted where testing the validity of the CSV file's structure is better. That is, I think testing the file and using the file would be better done separately. But if you're limited I'm sure you can hack something together even if it's not robust, and it should work for your needs if not others when things change.

    I'll take your word for it that passing NULL doesn't work. But hopefully you can use another character instead of the default enclosures.

    I don't know what kind of structural errors you need to worry about, but I threw together this

    lmsook.csv
    Code:
    "valid row","12345","Jane","Doe","jd@test.com"
    "missing enclosures","23232",Sarah,"Smith","ss@test.com"
    "missing field","99999","John","js@test.com"
    "empty field","98765","Dave",,"ds@test.com"
    "missing delimiter","56745","Chip""Whitley","cw@test.com"
    "wrong enclosures",'34985',"Santa","Claus","sc@test.com"
    "wrong delimiter","39581","Luke","Skywalker";"ls@test.com"
    and this using "^"
    lmsook.php
    PHP Code:
    <?php
    error_reporting
    (E_ALL);
    ini_set('display_errors'true);

    $handle fopen('lmsook.csv'"r");
    while ((
    $line fgetcsv($handle1000",""^")) !== FALSE
    {
        
    $num count($line);

        for (
    $c=0$c $num$c++) 
        {
            echo 
    $c ': ';
            
    var_dump($line[$c]);
            echo 
    '<br />';
        }
        echo 
    '<hr />';

    ?>
    You could use count() to catch most of the errors, and for those that pass you could trim() the "s and then use the values.

  9. #9
    messing with my mind fristi's Avatar
    Join Date
    Feb 2009
    Posts
    292
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I was sitting in the car and I realized that my function is not working in some cases...
    if the comma is in the beginning or at the end of the string between the enclosures it will mark that that as an error, which is not good... the cases are not happening alot, but as a programmer you can not take that risk.
    In the end I think it needs more complex processing to detect the places and numbers of $enclosers and $delimiters.

    I haven't had time to come up with something, if I do, I'll let you know.

    Just a heads up that the function I posted has bugs.
    To PHP or to Perl, that is the question!
    (Bucket - simpletest) User

  10. #10
    messing with my mind fristi's Avatar
    Join Date
    Feb 2009
    Posts
    292
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    My last attempt at this

    PHP Code:
    class CsvReader
    {
        private 
    $_handle;
        private 
    $_delimiter;
        private 
    $_enclosure;
        private 
    $_escape;
        private 
    $_pattern;

        
    /*
         * @method Constructor
         * @param  resource-handle  $handle  a file handle opend with fopen
         * @param  string  $delimiter  optional delimiter of the csv fields
         * @param  string  $enclosure  optional enclosure of the csv fields
         * @param  string  $escape  optional escape char for the $enclosure
         */
        
    public function __construct($handle$delimiter ','$enclosure '"'$escape '\\')
        {
            
    $this->_handle    $handle;
            
    $this->_delimiter $delimiter;
            
    $this->_enclosure $enclosure;
            
    $this->_escape    $escape;
            
    $this->_pattern   '/(?<!' preg_quote($escape) .')'
                                
    preg_quote($enclosure) . 
                                
    preg_quote($delimiter) . 
                                
    preg_quote($enclosure) . './';
        }
        
        
    /*
         * @method getHeader
         * @param  bool  $rewind  after getting header rewind the handle or not
         * @return array returns an array containing the column names
         */    
        
    public function getHeader($rewind false)
        {
            
    rewind($this->_handle);
            
    $first rtrim(fgets($this->_handle));

            
    $headers = (strpos($this->_enclosure$first) === false) ?
                       
    explode($this->_delimiter$first) :
                       
    preg_split($this->_pattern,$first) ;
            
            if (
    $rewind !== falserewind($this->_handle);
            return 
    $headers;
        }
        
        
    /*
         * @method checkSyntax
         * @return NULL|array returns NULL on success and an array containing the error rows on failure
         */ 
        
    public function checkSyntax()
        {
            
    $headers $this->getHeader();
            
    $need    count($headers);
            
    $err     = array();

            
    $nr 1;
            while(!
    feof($this->_handle)) {
            
                
    $row    rtrim(fgets($this->_handle));
                
    $pieces preg_split(
                    
    $this->_pattern
                    
    $row
                    -
    1
                    
    PREG_SPLIT_NO_EMPTY
                
    );
                
                if (
    count($pieces) != $need)
                    
    $err[$nr] = $row;
                
    $nr++;
            }
            return (
    count($err) === 0) ?
                    
    NULL $err;
        }


        
    /*
         * @method checkSyntax
         * @param bool $check requires a syntaxcheck before reading?
         * @return array returns an array in thr form of $arr[$rowNr][$fieldname]
         */ 
        
    function csvToArray($check true)
        {
            
    $errs = ($check == true) ?
                    
    $this->checkSyntax() :
                    
    NULL;
            
            if (!
    is_null($errs)) {
                
    $errStr '';
                foreach (
    $errs as $nr => $err) {
                    
    $errStr .= "$nr$err<br/>\n"
                }
                throw new 
    Exception("CSV file error:<br/>\n$errStr");
            }
         
            
    $headers $this->getHeader();       
            
    $amount  count($headers);

            while (
    $data fgetcsv($this->_handle0$this->_delimiter$this->_enclosure)) {
                
    $named = array();
                foreach (
    $headers as $pos => $key) {
                    
    $named[$key] = $data[$pos];
                }
                
    $arr[] = $named;
            }
            
            return 
    $arr;
        }

    example:

    PHP Code:
    $importFile 'test.csv';
    $handle fopen($importFile"r");
    $reader = new CsvReader($handle);
    try {
        
    $fetched $reader->csvToArray();
    } catch(
    Exception $e) {
      echo 
    'Message: ' $e->getMessage();
      exit;
    }
    print_r($fetched); 
    To PHP or to Perl, that is the question!
    (Bucket - simpletest) User

  11. #11
    Programming Team silver trophybronze trophy
    Mittineague's Avatar
    Join Date
    Jul 2005
    Location
    West Springfield, Massachusetts
    Posts
    17,036
    Mentioned
    187 Post(s)
    Tagged
    2 Thread(s)
    I like that class a lot. You have good coding style.

    I found only one problem. CSV files typically use newlines (\r, \n, or \r\n) as line terminators. The class finds the last empty newline as an error because it doesn't equal $need as determined from the count($headers).

    I don't know how you would choose to address this but something like
    PHP Code:
    public function checkSyntax()
    .....

                if ( (
    count($pieces) != $need) && ($row != NULL) )
                    
    $err[$nr] = $row;
                
    $nr++; 
    seems to work OK.

    And the only thing that could be improved (for lazy people like me) is for the script to tell me what the error is so I don't need to figure it out for myself

  12. #12
    messing with my mind fristi's Avatar
    Join Date
    Feb 2009
    Posts
    292
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Mittineague View Post
    I like that class a lot. You have good coding style.
    Thanks Most of that style I picked up while coding Perl. I have come a long way to get this, and still looking for improvements. At this point I started to read Perl Best Practices to do so. With Perl you can do things in so many ways that if you don't learn yourself a good style, the code just becomes unreadable...

    Quote Originally Posted by Mittineague View Post
    I found only one problem. CSV files typically use newlines (\r, \n, or \r\n) as line terminators. The class finds the last empty newline as an error because it doesn't equal $need as determined from the count($headers).

    I don't know how you would choose to address this but something like
    PHP Code:
    public function checkSyntax()
    .....

                if ( (
    count($pieces) != $need) && ($row != NULL) )
                    
    $err[$nr] = $row;
                
    $nr++; 
    seems to work OK.
    Great! I didn't think of that, in my example CSV I didn't have a new line at the end of the last line so I didn't encounter that problem. You're fix seems ok to me, it's just that I'm surprised that NULL seems to work because $row is the result of an rtrim function which should always return a string, so in case of the last line it should be an ampty string. Maybe it works because it doesn't use the strict evaluation (!==)

    I wanted to edit my class in my previous post to add your fix, but it seems I can not anymore.... meh the OP just has to do it himself ;-)

    Quote Originally Posted by Mittineague View Post
    And the only thing that could be improved (for lazy people like me) is for the script to tell me what the error is so I don't need to figure it out for myself
    The problem in there lies that if you want more info, then the checking needs to be done on a much more complex level, walking through the string and evaluating the order of chars or using some very complex regexes...
    I thought it was overkill for something simple like this.

    Maybe when I'm bored I will give it a shot. Or you are always welcome to try
    didn't put a copywrite on it so it's free to be modified!

    anyway thanks for your response, I appriciate it alot! it doesn't seem like I wasted my evening then
    To PHP or to Perl, that is the question!
    (Bucket - simpletest) User

  13. #13
    Programming Team silver trophybronze trophy
    Mittineague's Avatar
    Join Date
    Jul 2005
    Location
    West Springfield, Massachusetts
    Posts
    17,036
    Mentioned
    187 Post(s)
    Tagged
    2 Thread(s)
    :d'oh: I didn't see the rtrim. I looked in the docs and it said "returns a NULL array" so I tried a strict equal, which failed, var_dump()ed and saw the empty string but instead of tracking it to the source tried the loose comparison which worked.
    Off Topic:

    I confess I need some sleep pretty badly, I've had a rough couple of days/nights, but I hate to admit it, and am good at fooling myself until such blunders pop up.

    Anyway, as that's the case than a not empty would be better
    PHP Code:
    .....
                if ( (
    count($pieces) != $need) && !empty($row) )
                    
    $err[$nr] = $row;
                
    $nr++; 
    And yes, returning the error type would add considerably to the complexity. Probably not worth it unless this is going to be used on an awful lot of files and very often. Even then it's not so bad if someone needs to look at the line to determine the error. If someone is working with that many files the correct line format would be so ingrained as to almost ensure spontaneous recognition of errors.
    Off Topic:

    Yes, AFAIK there's a 10 minute to edit window. This is to prevent someone from making a good post, getting by the moderators, and then coming back later adding in SPAM with less likelihood of getting caught. Frustrating for honest members and just another case of SPAMmers ruining it for the rest of us.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •