However this still doesn’t help. So my solution was to split the CSV file into manageable file sizes. So split the 75mb into 5mb files leaving me with 15 files, then loop around each of these files individually.
The only problem with this is that I am not sure how to do this? Please could someone advise?
<?php
$csv = new SplFileObject('data.csv');
$csv->setFlags(SplFileObject::READ_CSV);
foreach(new LimitIterator($csv, 0, 500) as $line){
#save $line
}
?>
You would just need to increase the starting point (0 in this case) on each run. The example saves lines 0-500, on the next run change it the 0 to 501 to save lines 501-1001.
What does that achieve? As far as I can see, the ->eof() call is unnecessary. If the OP wanted to iterate the complete file in one go, they’d only need the foreach loop no?
The only reason for counting the num of rows was just to see how many times i would need to run through the loop. But siteguru’s solution seems to work fine. Unless you can spot something wrong or come up with another way which is more efficient?
I’d like to offer some critique of the snippets offered so far, and my own similar idea.
First, they both use the SplFileObject::READ_CSV flag. This is entirely unnecessary since the aim (if I’ve understood) is to split the big file into multiple smaller files. That said, the SplFileObject approach might be a good way to solve the whole issue because, as it only reads one line at any given time then the memory usage is tiny (compared to reading the whole file like with the file() function).
Also, both snippets limit the size of the file (or amount of processed data) by the number of lines rather than 5MB of data.
So, my idea is to read the big CSV file line-by-line and write that data into smaller files (named data-[i]<number>[/i].csv) as needed.
$csv = new SplFileObject("data.csv");
$bound = 5 * 1024 * 1024; // 5MB in bytes
$bytes = 0;
$chunk_id = 1;
foreach ($csv as $line) {
// If we have not opened a chunk file yet
// or writing to the chunk file would overflow
// the bound then start a new chunk file
if ( ! isset($chunk) || $bytes + strlen($line) > $bound) {
$bytes = 0;
$chunk = new SplFileObject(sprintf("data-%03d.csv", $chunk_id++), "w");
echo "Created {$chunk->getFilename()}\
";
}
// Write the line to the current chunk file and tally up the bytes written
$bytes += $chunk->fwrite($line);
}
@Anthony: Depends what is happening during the processing stage (“insert it into a db”) if there is one, and other limitations (time limit, etc.), which only the OP can tell us. “PHP runs out of resources” is pretty vague! For all we know, the real solution might not even be PHP-related.
Jamie, would you might stepping back for a minute and answering a few questions?
If we could suggest some solutions which don’t involve writing those smaller files, but may be quicker and/or less memory greedy than that you’re currently using, would you be willing to try those and see if they’re too slow, or use too much memory, or make any other bottlenecks?
If we suggest code, could you try that code and report back (see previous), rather than taking bits and pieces and posting your changes (for review?)?
How much work is involved when processing the CSV data (into the database)?
If we keep asking questions, rather than being helpful and providing code snippets, would you mind?
Would you ask questions if the code isn’t clear to you, or just muddle through and if it works then awesome?