file_get_contents() will not read a URL

I’m using file_get_content() on this URL ‘www.foxnews.com/us

But I get the ‘failed to open stream’ warning.

I’ve gotten that warning when I was accidentally trying to read a bad URL. But ‘www.foxnews.com/us’ works in a browser.

Well, actually the browser appends ‘/index.html’ to it, but ‘www.foxnews.com/us/index.html’ also fails to open stream.

Why isn’t this valid URL working?

Most likely due to the allow_url_fopen setting: http://php.net/manual/en/filesystem.configuration.php

My instinct tells me you should not be attempting to scrape fox news unless you have permission to do so. Setting that asaid you should be using cURL rather file get contents since file_get_contents does not provide any control over timeout. As a general rule though you should only use file_get_contents when accessing files locally available on the file system. Otherwise you should be using cURL.

Thanks all. will probably go with cURL. allow_url_fopen is set to allow.

Foxnews was just an example. But what’s wrong with scraping any URL? Don’t search engines do that?

I theory you would be stealing content. Now whether fox legal department would come after you is another story but it is still stealing.

Most websites clearly state you MAY NOT use their data without their express authorized permission.

While there is some leniency given to search engines, most likely because they only capture a short blurb of the text with explicit link to the source (foxnews’ own website), the general rule is dont do it.

For example, here is the relevant quote from FoxNews’ Terms of Use, which took me all of 10 seconds to find:

“Except as provided in this Agreement or as explicitly allowed on the FOX News Services, you may not copy, download, stream capture, reproduce, duplicate, archive, upload, modify, translate, publish, broadcast, transmit, retransmit, distribute, perform, display, sell or otherwise use any Content appearing on or through the FOX News Services.”