SitePoint Sponsor

User Tag List

Results 1 to 17 of 17

Hybrid View

  1. #1
    SitePoint Wizard silver trophybronze trophy Cups's Avatar
    Join Date
    Oct 2006
    Location
    France, deep rural.
    Posts
    6,869
    Mentioned
    17 Post(s)
    Tagged
    1 Thread(s)

    Reading csv. How to convert array from numeric keys?

    PHP Code:
    $keys = array('name''address');

    handle fopen("data.csv""r") ;

        while ((
    $data fgetcsv($handle1000",")) !== FALSE) {

    // ?

       

    What is the most efficient way of turning a sample data.csv file such as this:
    Code:
    name, address
    "C Stunt", "Letsby Avenue"
    "Ishood Coco", "Arfway house"
    into an associated array such as
    Code:
    $arr[0]['name'] = "C Stunt" ;
    $arr[0]['address'] = "Letsby Avenue" ;
    $arr[1]['name'] = "Ishood Coco" ;
    $arr[1]['address'] = "Arfway house" ;
    Naturally the real csv file has many more columns than this, I just wondered if there was an easy way of doing it, rather than re-assigning the values by hand?

  2. #2
    SitePoint Zealot Jaanboy's Avatar
    Join Date
    Sep 2007
    Location
    UK
    Posts
    119
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    PHP Code:
    $handle   fopen("data.csv""r");
    $count    0;
    $headers  = array();
    $arr      = array();

    while ((
    $data fgetcsv($handle1000",")) !== FALSE)
    {
        if (
    $count == 0)
        {
            
    $headers $data;
            
            continue;
        }
        
        for (
    $i 0$i count($headers); ++$i)
        {
            
    $arr[$count][$headers[$i]] = $data[$i];
        }
        
        ++
    $count;

    ?

  3. #3
    @php.net Salathe's Avatar
    Join Date
    Dec 2004
    Location
    Edinburgh
    Posts
    1,396
    Mentioned
    61 Post(s)
    Tagged
    0 Thread(s)
    PHP Code:
    $row array_combine($headers$data); 
    P.S. Unless you really know and trust your CSV file, it's probably worth checking the length of the $data is the same as the number of headers before combining.
    Salathe
    Software Developer and PHP Manual Author.

  4. #4
    Twitter: @AnthonySterling silver trophy AnthonySterling's Avatar
    Join Date
    Apr 2008
    Location
    North-East, UK.
    Posts
    6,111
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    I'd extend or implement the SPL Iterator interface, assign the keys on the first iteration and return the set key from Iterator::key() on subsequent requests.

    You can the use this with the standard SPL File object (which reads CSV's) and read the data as you go.

    Very memory efficient.
    @AnthonySterling: I'm a PHP developer, a consultant for oopnorth.com and the organiser of @phpne, a PHP User Group covering the North-East of England.

  5. #5
    Twitter: @AnthonySterling silver trophy AnthonySterling's Avatar
    Join Date
    Apr 2008
    Location
    North-East, UK.
    Posts
    6,111
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    Right, I'm on a mission to SPL'ize this, for reference, I need to recreate the following.

    PHP Code:
    $file = new SplFileObject('data.csv');
    $file->setFlags(SplFileObject::READ_CSV);

    $keys null;

    foreach(
    $file as $line => $entry){
      if(
    null === $keys){
        
    $keys array_values($entry);
        continue;
      }
      foreach(
    $entry as $key => $value){
        
    printf('(%d) %s => %s<br />'$line$keys[$key], $value);
      }
    }

    /*
      (1) name => C Stunt
      (1) address => Letsby Avenue
      (2) name => Ishood Coco
      (2) address => Arfway house
    */ 
    @AnthonySterling: I'm a PHP developer, a consultant for oopnorth.com and the organiser of @phpne, a PHP User Group covering the North-East of England.

  6. #6
    Twitter: @AnthonySterling silver trophy AnthonySterling's Avatar
    Join Date
    Apr 2008
    Location
    North-East, UK.
    Posts
    6,111
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    So far, I have.
    PHP Code:
    $csv = new SplFileObject('data.csv');
    $csv->setFlags(SplFileObject::READ_CSV);
    foreach(new 
    AssociativeSplCsvFileObjectIterator($csv) as $line => $entry){
      foreach(
    $entry as $key => $value){
        
    printf('(%d) %s => %s<br />'$line$key$value);
      }
    }

    /*
      (0) name => C Stunt
      (0) address => Letsby Avenue
      (1) name => Ishood Coco
      (1) address => Arfway house
    */ 
    PHP Code:
    class AssociativeSplCsvFileObjectIterator implements Iterator
    {
      protected
        
    $csv      null,
        
    $keys     = array(),
        
    $position 0;
      
      public function 
    __construct(SplFileObject $csv){
        
    $this->csv $csv;
        
    $this->setKeys();
      }
      
      public function 
    rewind(){
        
    $this->setKeys();
        
    $this->position 0;
      }
      
      public function 
    current(){
        return 
    array_combine($this->keys$this->csv->current());
      }
      
      public function 
    key(){
        return 
    $this->position;
      }
      
      public function 
    next(){
        
    $this->csv->next();
        
    $this->position++;
      }
      
      function 
    valid(){
        return 
    $this->csv->valid();
      }
      
      protected function 
    setKeys(){
        
    $this->csv->rewind();
        
    $this->keys array_values($this->csv->current());
        
    $this->csv->next();
      }

    I can't say I'm happy with the Iterator yet though.
    @AnthonySterling: I'm a PHP developer, a consultant for oopnorth.com and the organiser of @phpne, a PHP User Group covering the North-East of England.

  7. #7
    SitePoint Wizard silver trophybronze trophy Cups's Avatar
    Join Date
    Oct 2006
    Location
    France, deep rural.
    Posts
    6,869
    Mentioned
    17 Post(s)
    Tagged
    1 Thread(s)
    Whoa, 3 very different answers there, thanks people.

    @Anthony I was hoping there was an SPL -based solution after seeing your pastebin-housed SPL experiment a few weeks ago.

    The initial target block of code you set works fine and seems to handle empty values as expected.

    As I have a series of operations to do, that target code will allow me to quickly extract the initial values from a large csv file, to create the smaller one.

    I might as well explain the whole thing.

    1 I wget a largish csv file nightly ( ~4k rows) and cache it.

    2 I extract only those rows of interest to me, around 100 rows.

    3 Some of those text fields will then be checked for "transformations" ie turning email into <a></a> links etc, squirt it into a template.

    4 The result is then duly cached for 24 hrs.

    So you see, between steps 2 and 3 it'll make much more sense for me, or anyone else using the data to be able to reference the array-elements by name.

  8. #8
    Twitter: @AnthonySterling silver trophy AnthonySterling's Avatar
    Join Date
    Apr 2008
    Location
    North-East, UK.
    Posts
    6,111
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    Regarding the filtering, see FilterIterator, use it to wrap AssociativeSplCsvFileObjectIterator and only return the rows meeting the criteria you set within FilterIterator::accept().

    Other than that, AssociativeSplCsvFileObjectIterator still needs some refactoring but the premise is there.
    @AnthonySterling: I'm a PHP developer, a consultant for oopnorth.com and the organiser of @phpne, a PHP User Group covering the North-East of England.

  9. #9
    SitePoint Wizard silver trophybronze trophy Cups's Avatar
    Join Date
    Oct 2006
    Location
    France, deep rural.
    Posts
    6,869
    Mentioned
    17 Post(s)
    Tagged
    1 Thread(s)
    When I first looked in the fgetcsv manual I found this old project

    http://code.google.com/p/parsecsv-for-php/

    And did not really down the page any further, but revisiting it I just noticed that there are some refs to the SPL, but they just use fgetcsv rather than an SPLFileobject.

    It would have been interesting if there were some test cases for all this stuff.

    That parsecsv class seems to want to emulate really simple sql statements, and may have been born from frustrations with formatting and encoding - and just seemed overkill to me.

    I thought I'd mention it in case anyone else was searching on this subject.

  10. #10
    @php.net Salathe's Avatar
    Join Date
    Dec 2004
    Location
    Edinburgh
    Posts
    1,396
    Mentioned
    61 Post(s)
    Tagged
    0 Thread(s)
    For what it's worth, a minimalist version of Anthony's AssociativeSplCsvFileObjectIterator might be of use. It uses a FilterIterator (to only accept rows with count(headers) fields) which extends IteratorIterator (which does the iterator of our SplFileObject without us having to redefine the basic iteration methods).

    (The class name is just poking fun, probably best not to keep it for yourself.)
    PHP Code:
    class MinimalistFilterIteratorVersionOfAnthonysAssociativeSplCsvFileObjectIterator extends FilterIterator
    {
        protected 
    $_headers$_count;
        public function 
    __construct($path)
        {
            
    // Build CSV file iterator
            
    $csv = new SplFileObject($path'r');
            
    $csv->setFlags(SplFileObject::READ_CSV);
            
    // Remember column names and count
            
    $this->_headers $csv->current();
            
    $this->_count   count($this->_headers);
            
    // LimitIterator allows us to always skip the headers
            
    parent::__construct(new LimitIterator($csv1));
        }
        
    // Returns a nice assoc. array
        
    public function current()
        {
            return 
    array_combine($this->_headersparent::current());
        }
        
    // Skip this line if it does not have $_count number of fields
        
    public function accept()
        {
            return 
    $this->_count === count(parent::current());
        }
    }

    $csv = new MinimalistFilterIteratorVersionOfAnthonysAssociativeSplCsvFileObjectIterator('data.csv');
    foreach (
    $csv as $row) {
        
    print_r($row);

    Salathe
    Software Developer and PHP Manual Author.

  11. #11
    Twitter: @AnthonySterling silver trophy AnthonySterling's Avatar
    Join Date
    Apr 2008
    Location
    North-East, UK.
    Posts
    6,111
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    Thanks Salathe!

    I was thinking of doing something similar, but didn't like the idea of creating the CSV and flags internally. However, the combination of the LimitIterator and FilterIterator is something I hadn't considered.

    I like that a lot.

    Thanks again.

    Anthony.

    ps. If you weren't so awesome, I think you'd be my current nemesis.
    @AnthonySterling: I'm a PHP developer, a consultant for oopnorth.com and the organiser of @phpne, a PHP User Group covering the North-East of England.

  12. #12
    @php.net Salathe's Avatar
    Join Date
    Dec 2004
    Location
    Edinburgh
    Posts
    1,396
    Mentioned
    61 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by AnthonySterling View Post
    I ... didn't like the idea of creating the CSV and flags internally.
    There's nothing stopping the class from bring rewritten to accept an SplFileObject in the constructor, but then you'd have to keep track of whether we're looking at a CSV file and behave accordingly since pretty much everything else in there is tailored towards this particular, individual use case. That's a detail for Cups to ponder, to see which he prefers.

    Quote Originally Posted by AnthonySterling View Post
    ps. If you weren't so awesome, I think you'd be my current nemesis.
    I can be less awesome if you'd prefer.
    Salathe
    Software Developer and PHP Manual Author.

  13. #13
    SitePoint Wizard silver trophybronze trophy Cups's Avatar
    Join Date
    Oct 2006
    Location
    France, deep rural.
    Posts
    6,869
    Mentioned
    17 Post(s)
    Tagged
    1 Thread(s)
    In order to filter the original csv file, I used Salathes idea and am mucking about in the accept() method.

    This works as expected, but looks wrong;
    PHP Code:
        public function accept()
        {

        
    $item $this->getInnerIterator();
        
    $i $item->current() ;

        if( 
    $this->_count === count(parent::current()) && $i[14] === "ABC" )
            return 
    true;


        } 
    I'll probably abstract away the hard coded match between item 14 and ABC, but am I accessing the 15th array item in the best way, or is there something obvious I missed?

  14. #14
    Twitter: @AnthonySterling silver trophy AnthonySterling's Avatar
    Join Date
    Apr 2008
    Location
    North-East, UK.
    Posts
    6,111
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    I still think additional filtering should be separate from the object that composes the associative entry.

    This particular method (or rather any method) should 'do one thing and one thing only' to quote Uncle Bob. In this case, it's determining whether or not the entry is acceptable for the associative array.

    It's a code smell, definitely.

    How would you change that filter? You have to alter the object internals, and well, you know how that goes.

    Create a filter for each filter, which sounds kinda obvious when you say it like that, from an OO POV.

    Don't make me crack open the SRP wiki link.
    @AnthonySterling: I'm a PHP developer, a consultant for oopnorth.com and the organiser of @phpne, a PHP User Group covering the North-East of England.

  15. #15
    SitePoint Wizard silver trophybronze trophy Cups's Avatar
    Join Date
    Oct 2006
    Location
    France, deep rural.
    Posts
    6,869
    Mentioned
    17 Post(s)
    Tagged
    1 Thread(s)
    Sometimes there is a clash between SRP, JFDI and the clock and sometimes the wrong one wins, and I know it and I feel bad but at least I know why and I don't care - so shoot me.

    If have never needed to do this to a csv file before, in the unlikely event that I ever do then I will go back and refactor.

    (just seeing that makes me more inclined to go back NOW... the force ... is strong ... I feel weak ...I can resist...)

    I mean I just know, know, know in my bones that I should be unit testing this too, but it has to get out the door and onto a website like, yesterday.

    ps Thanks to all for your input on this one by the way, really interesting list of solutions.

  16. #16
    Twitter: @AnthonySterling silver trophy AnthonySterling's Avatar
    Join Date
    Apr 2008
    Location
    North-East, UK.
    Posts
    6,111
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    I admire your 'JFDI fu' Mr.Geraghty, good luck and godspeed.

    Off Topic:

    Given the location of the file you mentioned off-board, would YQL be an option to filter the results?
    @AnthonySterling: I'm a PHP developer, a consultant for oopnorth.com and the organiser of @phpne, a PHP User Group covering the North-East of England.

  17. #17
    SitePoint Wizard silver trophybronze trophy Cups's Avatar
    Join Date
    Oct 2006
    Location
    France, deep rural.
    Posts
    6,869
    Mentioned
    17 Post(s)
    Tagged
    1 Thread(s)
    ... and when it bites you no the *rse don't come running to me...

    Y'see - thats why I know I will never, ever be a ninja like you - just as I know I will never play for England nor become a famous Hollywood actor.

    Thank the lord for ninjas, soccer players and actors though.


Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •