SitePoint Sponsor

User Tag List

Results 1 to 10 of 10
  1. #1
    SitePoint Addict Skookum's Avatar
    Join Date
    Sep 2006
    Location
    Idaho
    Posts
    375
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    GZ uncompressed file size

    I am working with text files that are gzipped and I am trying to come up with the size of the file so that I can play with it.

    So far what I have is this
    PHP Code:
    $FileRead 'mlsdata/listings-residential-active.txt.gz';
    $HandleRead gzopen($FileRead"rb");
    $ContentRead gzread($HandleReadfilesize($FileRead)); 
    But that just gives me the file size of the gz compressed, not the gz uncompressed.

    Now that i think about it I just need to go till eof, and removing the filesize($FileRead) altogether still does not go till eof.

    I would still like to know how to get the filesize of a uncompressed gz for future reference.
    Paranoia is no longer a mental illness it is a way of life - Me

  2. #2
    SitePoint Addict Skookum's Avatar
    Join Date
    Sep 2006
    Location
    Idaho
    Posts
    375
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Reading the notes on the gzread on php.net it states this
    So either use the actual uncompressed size, if you know it, or use an aribtrary big enough length, as gzreading will stop at the end of the file anyway.
    I'm not sure I like the solution to just put a big number that will handle all of my possible file sizes that I deal with into the function.
    Paranoia is no longer a mental illness it is a way of life - Me

  3. #3
    Worship the Krome kromey's Avatar
    Join Date
    Sep 2006
    Location
    Fairbanks, AK
    Posts
    1,621
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    If you can't get the actual size and don't want to use an arbitrary big number in its stead, use file_get_contents instead.
    PHP questions? RTFM
    MySQL questions? RTFM

  4. #4
    SitePoint Addict Skookum's Avatar
    Join Date
    Sep 2006
    Location
    Idaho
    Posts
    375
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by kromey View Post
    If you can't get the actual size and don't want to use an arbitrary big number in its stead, use file_get_contents instead.
    When using file_get_contents I get the compressed info of the file rather than the text in the file. Unless I am misunderstanding something.

    I found that if I multiply the bytes of the compressed file by 4 it puts me about .8 times larger than the uncompressed file. I started reading about the algorithm used in gunzip and I might be able to get this a little more precise, but for my intensive purposes this should work fine.

    If I get bored then I will try to write a function, or possibly an algorithm to figure out exact file size of an uncompressed gz, if anyone is interested in this let me know, if not then it will probably be a permanent fixture on my to do list
    Paranoia is no longer a mental illness it is a way of life - Me

  5. #5
    SitePoint Wizard stereofrog's Avatar
    Join Date
    Apr 2004
    Location
    germany
    Posts
    4,324
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    According to gzip specification, uncompressed size is stored in last four bytes of the file, little endian.

  6. #6
    SitePoint Addict Skookum's Avatar
    Join Date
    Sep 2006
    Location
    Idaho
    Posts
    375
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by stereofrog View Post
    According to gzip specification, uncompressed size is stored in last four bytes of the file, little endian.
    I must not have read that far on the page.

    Well now the most obvious question is how do I read the last 4 bytes of the gz?

    I tried doing
    PHP Code:
    $FileName 'mlsdata/listings-farm and ranch-active.txt.gz';
    $Test file_get_contents($FileName);
    var_dump($Test); 
    But I get all of the jumbled characters due to the compression. And for me to try and unzip the file to read the last 4 bytes I run into my original problem.
    Paranoia is no longer a mental illness it is a way of life - Me

  7. #7
    SitePoint Wizard stereofrog's Avatar
    Join Date
    Apr 2004
    Location
    germany
    Posts
    4,324
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    fopen a file. fseek to filesize - 4. fread 4 bytes in a string buffer. unpack buffer to an integer. That's basically all about it.

  8. #8
    SitePoint Addict Skookum's Avatar
    Join Date
    Sep 2006
    Location
    Idaho
    Posts
    375
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Okay now I just feel dumb.

    I have this
    PHP Code:
    $FileName 'mlsdata/listings-farm and ranch-active.txt.gz';
    $HandleRead fopen($FileName"rb");
    $Seeking = (filesize($FileName) - 4);
    $ContentRead fseek($FileName, -4SEEK_END);
    $GZFileSize fread($ContentRead4);

    var_dump($ContentRead);

    echo 
    $GZFileSize ."<br>"
    And I get a false for the $ContentRead.

    So I tried this
    PHP Code:
    $FileName 'mlsdata/listings-farm and ranch-active.txt.gz';
    $HandleRead fopen($FileName"rb");
    $Seeking = (filesize($FileName) - 4);
    $ContentRead fseek($FileName$SeekingSEEK_SET);
    $GZFileSize fread($ContentRead4);

    var_dump($ContentRead);

    echo 
    $GZFileSize ."<br>"
    Still receive a false.

    Then I tried this
    PHP Code:
    $FileName 'mlsdata/listings-farm and ranch-active.txt.gz';
    $HandleRead fopen($FileName"rb");
    $Seeking = (filesize($FileName) - 4);
    $ContentRead fseek($FileName$Seeking);
    $GZFileSize fread($ContentRead4);

    var_dump($ContentRead);

    echo 
    $GZFileSize ."<br>"
    And I still receive a false.

    I'm not sure what I am doing wrong.
    Paranoia is no longer a mental illness it is a way of life - Me

  9. #9
    SitePoint Wizard stereofrog's Avatar
    Join Date
    Apr 2004
    Location
    germany
    Posts
    4,324
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    PHP Code:
    $fp fopen(WHATEVER_FILENAME"rb");
    fseek($fp, -4SEEK_END);
    $buf fread($fp4);
    $size end(unpack("V"$buf)); 
    fseek and fread both expect file handle as first argument, not a filename. I also suggest you set error_reporting to E_ALL, at least during debug.

  10. #10
    SitePoint Addict Skookum's Avatar
    Join Date
    Sep 2006
    Location
    Idaho
    Posts
    375
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Yup, I knew I was just being dumb

    That works perfectly, thanks.

    Also I do have error_reporting(E_ALL ^ E_NOTICE); specified but it wasn't pulling up any errors.

    The more I look at it the more that I can't believe that I was putting the filename in there rather than the file handle. Oh well.

    Once again thanks.
    Paranoia is no longer a mental illness it is a way of life - Me


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •