Bizarre problem with accessing external urls from within php

Hi all, I am having a very strange PHP issue, and hoping someone will have some ideas :slight_smile:

Within PHP, functions that support urls (fopen, file_get_contents, etc) will contact the correct server and work fine, and othertimes mistakenly contacts localhost. It would seem like a DNS issue, but both PHP’s dns and outside programs like ping and wget work fine.

Does anyone have any ideas on why PHP would sometimes return files from an incorrect server?

Illustrative code & output follows:

<?php

// Enable extra error logging
error_reporting(-1);
ini_set('display_errors', 1);

// Works fine -- www.google.com
echo "www.google.com = ";
$a = fopen("http://www.google.com", "r");
echo fread($a, 100);   // OK, outputs 100 characters of google's homepage
fclose($a);
echo "\
";

// Works fine -- localhost
echo "localhost = ";
$b = fopen("http://localhost", "r");
echo fread($b, 100); // OK, outputs 100 characters of localhost's homepage
fclose($b);
echo "\
";

// Doesn't work -- google.com returns localhost's content
echo "google.com = ";
$c = fopen("http://google.com", "r");
echo fread($c, 100);   // Not OK, outputs 100 characters of localhost's homepage instead of google's
fclose($c);
echo "\
";

// Now, let's check DNS...
echo "www.google.com => " . gethostbyname("www.google.com") . "\
";  // OK
echo "localhost => " . gethostbyname("localhost") . "\
";            // OK
echo "google.com => " . gethostbyname("google.com") . "\
";          // OK... so why does fopen return differing content?

// And we can confirm with wget that google.com works fine
$wget = shell_exec("wget google.com -O -");
echo "wget = " . $wget; // returns google's homepage code

?>

Outputs (done on the command line in this case, but same output via apache):

$ php5 test.php
www.google.com = <!doctype html><html><head><meta http-equiv=“content-type” content=“text/html; charset=ISO-8859-1”><
localhost = <a href=“simworld”>SimWorld</a> / <a href=“eatv”>EATV</a> / <a href=“archivegames”>Archive Games</a> [FYI, this is the first 100 characters of the page on localhost]
google.com = <a href=“simworld”>SimWorld</a> / <a href=“eatv”>EATV</a> / <a href=“archivegames”>Archive Games</a>
www.google.com => 74.125.67.106
localhost => 127.0.0.1
google.com => 74.125.67.147
[wget output of contacting google and getting the correct page back is here]

I wonder if it’s because http://google.com sends a location header redirecting you to http://www.google.com and no body, yet you’re reading 100 bytes of a buffer that is less than 100 bytes, and getting some data left there by the last connection? It’s a stretch, but fopen() not following location headers does happen with certain configurations, so you can’t really read the body of that URL.

To rule it out, make the 3rd test www.google.com again instead of google.com.

Odd. What version of php5?

anything intresting from stream_get_meta_data() ?

Thanks for the responses! No luck yet though, I’m afraid.

crmalibu: PHP 5.2.6-1+lenny6 with Suhosin-Patch 0.9.6.2 (cli) (built: Feb 9 2010 08:59:10)

[Edit: Same problem with PHP 5.2.6-1+lenny8 with Suhosin-Patch 0.9.6.2 (cli) (built: Mar 14 2010 09:07:33)]

stream_get_meta_data doesn’t seem to provide much in the way of clues… for google.com, it returns a location parameter of the local server’s address but a URI of http://google.com

Dan Grossman: When changing the third test to www.google.com, the test works successfully and gets text from google’s homepage

I think Dan’s on the money regarding headers. An HTTP request for google.com returns

HTTP/1.x 301 Moved Permanently
Location: http://www.google.com/
Content-Type: text/html; charset=UTF-8
Date: Thu, 25 Mar 2010 03:01:32 GMT
Expires: Sat, 24 Apr 2010 03:01:32 GMT
Cache-Control: public, max-age=2592000
Server: gws
Content-Length: 219
X-XSS-Protection: 0

but for www.google.com

HTTP/1.x 200 OK
Date: Thu, 25 Mar 2010 03:01:33 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=UTF-8
Set-Cookie: PREF=ID=.....:TM=.....:LM=.....:S=.....; expires=Sat, 24-Mar-2012 03:01:33 GMT; path=/; domain=.google.com
Set-Cookie: NID=33=.....; expires=Fri, 24-Sep-2010 03:01:33 GMT; path=/; domain=.google.com; HttpOnly
Content-Encoding: gzip
Server: gws
Content-Length: 4629
X-XSS-Protection: 0

Id use a packet sniffer next to see whats going out and coming in.