SitePoint Sponsor

User Tag List

Results 1 to 12 of 12
  1. #1
    SitePoint Addict
    Join Date
    Mar 2005
    Posts
    319
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Can PHP handle downloading and writing a 3.5gb file?

    Hello

    I have written a script that loops through a folder and rips our certain items from an xml file.

    However, these files at 3.5gb a piece. It has become a chore and near impossible for the company to upload them due to bandwith bills in our office.

    What I was wondering, is using file_get_contents, could I download the XML file from our affiliates site and then fopen write the file on the server?

    I could have tried before posting this most likely, but I wondered if anyone knows if PHP can handle 3gb files.

    Thanks

  2. #2
    Twitter: @AnthonySterling silver trophy AnthonySterling's Avatar
    Join Date
    Apr 2008
    Location
    North-East, UK.
    Posts
    6,111
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    If I understand correctly...

    You could use XMLReader->read() , most other parsers (SimpleXML , DOMDocument) read the entire file into memory first.

    Using XMLReader, this will incrementally get the data you require.

    XMLReader->Open() supports a URI being passed.

    Good luck!

    SilverB.

  3. #3
    SitePoint Addict
    Join Date
    Mar 2005
    Posts
    319
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Well with small files I get no problem...but the 3.5gb feed kicks off a

    Code:
    Warning: file_get_contents(feed url innit :p ) [function.file-get-contents]: failed to open stream: HTTP request failed! in /home/aaron/public_html/not-index.php on line 7
    done
    Any ideas?

  4. #4
    Twitter: @AnthonySterling silver trophy AnthonySterling's Avatar
    Join Date
    Apr 2008
    Location
    North-East, UK.
    Posts
    6,111
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    Oh..... back to the drawing board....

  5. #5
    SitePoint Addict
    Join Date
    Mar 2005
    Posts
    319
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Sorry that was from the file_get_contents...trying XMLReader::Open atm.

  6. #6
    SitePoint Addict
    Join Date
    Mar 2005
    Posts
    319
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    How exactly does it work?


    Code:
    	$reader = new XMLReader();
    	$reader->open($url);
    but from there how do I get it to grab the XML inside without actually parsing over it with the likes of

    Code:
        while ($reader->read()) {
          echo $reader->name;
          if ($reader->hasValue) {
            echo ": " . $reader->value;
          }
          echo "\n";
        }	exit();
    I literally just want to get the XML inside into a variable...

  7. #7
    SitePoint Addict
    Join Date
    Mar 2005
    Posts
    319
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    http://pastebin.com/m5fd6ab27 << my current code.

    I'm not really sure which way to go on this :/

  8. #8
    SitePoint Member
    Join Date
    Apr 2008
    Posts
    6
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    use an external app

    You shouldn't be using file_get_contents, or any other PHP functions IMO. None of them will be able to handle 3.5GB (most will agressivly cache the contents into memory - causing you to hit both the php memory limit, the file limits and script execution time (see php.ini)

    You are probably getting errors now because of one of those 3 reasons, or you have safe mode switched on (IIRC http over file_get_contents only works with safe mode off, and you also might have to exclusively allow it in php.ini)

    That being said though, the best solution to your problem is to call an external program that can handle the file download. On UNIX you can use either fetch or wget, eg.

    $exec = exec('wget http://www.file.to.download.com/file.txt');
    if($exec) {
    // got file contents, so check its complete, the right size, etc.

    }

    or you can use the much better proc_open command, which I would recommend since your download will take a while to complete (and proc_open() lets you query the status of the externally running program using proc_get_status() - so you can give feedback on download progress etc.)

    the output from the download program you use (eg. wget) will go straight into a stream, so you can read it and parse it (parse for errors, etc.)

    if you are on windows, you could use wget.exe (which is a win32 port of wget with required cigwin libraries) and the same PHP functions

    doing the updates this way is *much* better than what you are doing now

  9. #9
    SitePoint Addict
    Join Date
    Mar 2005
    Posts
    319
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hmm yes you are right.

    Another thing I just considered is how the file is parsed, its better for it to loop through 250 medium sized file than 1 extremely large file.

    So its going to have to be done manually anyway.

  10. #10
    SitePoint Member
    Join Date
    Apr 2008
    Posts
    6
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by azz0r_wugg View Post
    Hmm yes you are right.

    Another thing I just considered is how the file is parsed, its better for it to loop through 250 medium sized file than 1 extremely large file.

    So its going to have to be done manually anyway.
    not manually, just have PHP call the external app (Eg. wget) and manage the whole process

    it should and could be all fully-auto (including splitting the files for parsing)

  11. #11
    Programming Since 1978 silver trophybronze trophy felgall's Avatar
    Join Date
    Sep 2005
    Location
    Sydney, NSW, Australia
    Posts
    16,875
    Mentioned
    25 Post(s)
    Tagged
    1 Thread(s)
    FTP is the appropriate protocol for files bigger than 1Mb - not HTTP. PHP should be able to handle files of any size provided that the function calls that use the appropriate protocol for the size of the file are used.
    Stephen J Chapman

    javascriptexample.net, Book Reviews, follow me on Twitter
    HTML Help, CSS Help, JavaScript Help, PHP/mySQL Help, blog
    <input name="html5" type="text" required pattern="^$">

  12. #12
    SitePoint Addict
    Join Date
    Mar 2005
    Posts
    319
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    How exactly do I do this widget thing then for wget?

    Is there a tutorial you can recommend?


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •