file_get_contents runs in a loop and returns 500 internal error

It has been a day almost I’m messing with this issue. It is a CodeIgniter app and following code is creating problem. I’m downloading images from Instagram site and saving in a zip file.

if (FALSE !== ($data = file_get_contents($item['name'])))
			$this->zip->add_data($filename, $data);

I’m running the code in a loop which runs more than 100 times. If loop is 50/60 times file_get_contents works fine else it is 100+ file_get_contents works but at the end of the request I get

Site temporarily unavailable.

Connection timed out - please try again.

How long does it take to run 50-60 times versus 100+ times? You are likely causes php to timeout on the request. You can increase the timeout limit in the php.ini file, but that isn’t a good solution.

For long processes, I usually run them via cronjob or command line to avoid the timeout issue altogether.

We are using a Rackspace cloud server - here is what the tech told me:

"The limit would be the 15 minute break off point for php. If it takes longer than 15 minutes to create the ZIP then it will stop the process.

The load balancer has a 30 second time out for the browser and you’ll see a time out error for this but the php will still run in the background for up to 15 minutes."

I’d look at changing how your process works. You will consistently continue to hit this limit without changes.

  1. Consider moving the zipping portion to a separate thread using system() or exec()
    Write a php script that accepts parameters so it knows how to create the zip file, your script would then use those parameters and generate the zip file necessary

  2. Consider using a cronjob to do this processing. Your script would write a file to a folder telling the cronjob what to process. Each time the job runs, it would grab 10 items to do, move them into a working folder, and then process them accordingly.

  3. If there are more than X items, process the first 15, then tell the script to run again to process the next 15, etc until all items are done.

15 minutes seems like a long time, and you may not be fully hitting that, but maybe hitting a memory limit or something similar that may generate a Internal Server 500 exception. So I’d also look into those factors.

Using file_get_contents() to fetch external content is a bad idea considering it has no timeout of its own. Instead you should be using cURL ideally multi cURL threaded in this case. It looks like what your doing is a perfect example of when appropriate to use multi cURL which will be much more effective and efficient than single threads in a loop. A little more work but worth the benefit.

I see, suggest me please some code that can do the same Job with cURL ?

I don’t do a lot of cURL but this (and the user content at the bottom of the page) may be what you need to get you started

Those tips should be hammered into the skulls of all who would be web scrapers , particulary #2 & #3. Unless there is some dynamic element such as a session value, variable URL/ file name etc then everything should be grabbed slowly at lesuire in batches trailing the initial pass of the indexing scrape. Try to grab to much at once and your server will overload, timing out or exhausting its memory. Hit a target site to often with requests and they may suspect somethng is wrong and start misbehaving when your webserver’s IP visits, then you have a real struggle to get data out.