I’m working on an idea that involves using PHP to grab a .csv file and parse it…but the thing is that the .csv in question is on a third party server. Obviously I would be asking for permission to use the file…it’s nothing dodgy. My question is from the technical side of things on the server what are the issues involved …being that it’s a third party environment that I’m not involved with.
Ie. Is it only possible if they don’t have a firewall or other restriction in place…or do I need to have CURL installed or my host needs to allow fsocketopen()
I thought I could use a code like this for example
$fileTemp = "http://www.somethirdpartysite.com/folder/thecsvfile.csv";
$fp = fopen($fileTemp,'r');
$datas = array()
while (($data = fgetcsv($fp)) !== FALSE)
$data['productName'] = trim($data);
$data['spec'] = trim($data);
$data['imageLocation'] = trim($data);
$datas = $data;
Or perhaps I could use php’s ftp function or something.
If anyone could advise I’d be grateful.
How you grab the file is up to you. cURL, fopen and so on.
What I advise is that you do it in two operations. 1) Grab the file, store it. 2) Read the file, parse it and the store the desired output ready to be used in your pages.
That way if the first part goes wrong, latency issues for example, then you can at least revert back to using old data.
That first part can be done say, hourly with a cron job.
What you want to avoid is a situation where you call a script which has to do all of these things one after the other without failure:
*connect to the external server
*grab the file
*display the output
I do appreciate the reply but what I trying to find out is what issues might there be with the fact the .csv file is on a thirdy party server.
Provided you have consent from the owners of the other site and they have no defences in place to prevent the the file being read such as an IP or referer block using htaccess or some server side scripting then there should be no significant problems. The size of the csv file could in theory cause problems if it takes longer than 30 seconds to move it between the two servers but it’s rare and you can overide(depends on your hosting) the runtime limit of the retrieval script to compensate.
As for parsing, flat file storage or uploading the processed results to a database these functions will of course consume part of your hosted web servers hourly resource allowence (%ram, %cpu and SQL calls/connections), how much of course depends upon the amount of raw data, what you are doing to process it and how often this is happening.
Hosts policies vary but most will suspended any site hitting the limits for those parameters shutting the site down till the owner can revise the offending script. So always start any web-content retrieval system slowly, watch your cpanel useage figures then increase the frequency. Remember to give your site some spare capacity as visitors & Googlebot can be unpredictable and a surge in either coming to a site can send you over useage limits.
Thank you very much. Your answer was a great help.