Concatenation of similar files

The simple (haha) question: Is there a better way to do this?


$files = scandir("somedirectory");
array_shift($files);
array_shift($files);
foreach($files AS $file) {
    $root = array_shift(explode(".",$file));
    file_put_contents($root.".log",file_get_contents($file),FILE_APPEND); //This seems... wasteful, somehow, but I cant think of a better way.
    unlink($file);
}

(File formats are timestamped, so they’ll be something like… moo.Apr20.log)

Barring executing a system command to merge the files, I personally don’t think there is. I mean, you’re going to have to read the file data to save it to the target file, although you could stream the data as you go; as opposed to reading the file in its entirety.

I’d probably start with something like…


<?php
class FileFilter extends FilterIterator
{
  public function accept(){
    return $this->current()->isFile();
  }
}

$directory = new DirectoryIterator('C:\\PHP');

foreach(new FileFilter($directory) as $file){
  echo $file->getFilename(), PHP_EOL ;
}

/*
  glib-2.dll
  gmodule-2.dll
  go-pear.bat
  icudt36.dll
  icudt38.dll
  icuin36.dll
  icuin38.dll
  icuio36.dll
  icuio38.dll
  icule36.dll
  icule38.dll
  iculx36.dll
  iculx38.dll
  icutest.dll
  icutu36.dll
  icutu38.dll
  icuuc36.dll
  icuuc38.dll
  install.txt
  libeay32.dll
  libenchant.dll
  libenchant_ispell.dll
  libenchant_myspell.dll
  ...
  ...
  ...
*/

Then I’d move onto wrapping $file with an object to provide the target filename, most likely using a strategy of some sort.

Maybe ending up with something like…


$directory = new DirectoryIterator('C:\\PHP');
$directory = new FileFilter($directory);
$saver = new FileSaver(new FileSaverStrategyDefault);

foreach($directory as $file){
  $saver->save($file);
}

Maybe. :slight_smile:

Agree with Anthony that I like SPL, but not sure I agree about the overhead of so many classes… SPL is pretty complete and what you’re doing is simple enough not to need all that (imo).

Main suggestion or change I’d make is to chunk the input files line by line (or in limited byte-size chunks) instead of using file_get_contents… so if you have a huuuuge file it doesn’t suck up all of your system memory to read it and ultimately crash your PHP instance because you don’t have enough memory allocated when you come across a 1GB file (I’ve done this before lol). :slight_smile:

This will use a little more memory than a simple fread / fwrite variation on your original, but i think its worth it to use SPL. Very clean library / good habit to form.


$log = new SplFileObject($root.".log","a");
$dir = new DirectoryIterator('directorypath');
foreach($dir as $f) {
    if ($f->isFile() && $f->isReadable() {
        $file = $f->openFile("r");
        foreach ($file as $num => $line) {
            $log->fwrite($line."\
");
        }
    }
}

Some good points there Transio. :slight_smile:

Taking those on-board, can you suggest some improvements?


<?php
class ReadableFileFilter extends FilterIterator
{
  public function accept(){
    return $this->current()->isFile() && $this->current()->isReadable();
  }
}

function getTargetFilename(SplFileInfo $file){
  /*
    Add something clever here
  */
  return 'txt.log';
}

$directory = new GlobIterator('C:\\Users\\Anthony\\*.txt');

foreach(new ReadableFileFilter($directory) as $file){
  $target = new SplFileObject(getTargetFilename($file), 'a+');
  foreach($file->openFile() as $line){
    $target->fwrite($line);
  }
}

Such as cat. I honestly think this would be simplest and fastest. Is there a reason which counts this out, after all it does sound awfully like a logfile management task (as such typically called by cron). Is it *nix based?

LOL yeah, cups, cat would be wayyyy faster AND simpler at the php level…


$dir = new DirectoryIterator('directorypath');
foreach($dir as $file) {
    if ($file->isFile()) {
        $path = $file->getRealPath();
        shell_exec("cat {$path} >> {$root}.log");
        unlink($file->getRealPath());
    }
}  

And done. :slight_smile:

Or maybe even be faster… execute all in one cat.


$dir = new DirectoryIterator('directorypath');
$path = "";
foreach($dir as $file)
    if ($file->isFile()) $path .= $file->getRealPath() . " ";

shell_exec("cat {$path} >> {$root}.log");

foreach($dir as $file)
    if ($file->isFile()) unlink($file->getRealPath());

It is *nix based, but on a system i do not have control over (tbh I havent actually checked if PHP is allowed to exec() on the system, generally assumed it wouldnt be able to), so cron wasnt really an option to me.