I am rather confused regarding the response that CURL is returning me when I try and access a website.
What I am after is the response code. Which is all well and good. Only thing is that CURL seems to return a variety of response codes and I am unclear as to how they tie together.
Take for example the following response…
HTTP/1.1 301 Moved Permanently
Location: http://www.google.com/
Content-Type: text/html; charset=UTF-8
Date: Sat, 07 Jan 2012 23:49:39 GMT
Expires: Mon, 06 Feb 2012 23:49:39 GMT
Cache-Control: public, max-age=2592000
Server: gws
Content-Length: 219
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
HTTP/1.1 200 OK
Date: Sat, 07 Jan 2012 23:49:39 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
Set-Cookie: PREF=ID=3bdab1cd4225c488:FF=0:TM=1325980179:LM=1325980179:S=9V1GOM2Gf8DlN_-k; expires=Mon, 06-Jan-2014 23:49:39 GMT; path=/; domain=.google.com
Set-Cookie: NID=54=dZFexKNdSVB943cwresQuwA4wJVZiuar4BLjbEJ-EuUZblmkOaNDMiUBvACmxSzOMF_ZedjapSR_zkP4oPku7kBUhLx6l6rxnDr_CYtawAOPlFLWy7xLE0oIAKOP0DTM; expires=Sun, 08-Jul-2012 23:49:39 GMT; path=/; domain=.google.com; HttpOnly
P3P: CP=“This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info.”
Server: gws
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
Transfer-Encoding: chunked
Array
(
[0] => HTTP/1.1 301
[1] => 301
)
Here is the PHP code using CURL that produced it.
<?php
// code mostly from: http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/
$url = "http://google.com";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
/* set the user agent - might help, doesn't hurt */
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)');
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
/* try to follow redirects */
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
/* timeout after the specified number of seconds. assuming that this script runs
on a server, 20 seconds should be plenty of time to verify a valid URL. */
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 15);
curl_setopt($ch, CURLOPT_TIMEOUT, 20);
/* don't download the page, just the header (much faster in this case) */
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_HEADER, true);
/* handle HTTPS links */
if(strpos($url, 'https')) {
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
}
$response = curl_exec($ch);
//curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
print_r($response);
/* get the status code from HTTP headers */
if(preg_match('/HTTP\\/1\\.\\d+\\s+(\\d+)/', $response, $matches)) {
print_r($matches);
}
?>
My question is this…
There is one response which says “HTTP/1.1 301 Moved Permanently” and another which says “HTTP/1.1 200 OK”. Are they both correct? How so?
I mean the url I am going to is “http://google.com”. If I understand the response received correctly is it saying that this url is first redirected to “http://www.google.com” and that the response from an attempt to access that page is 200 OK? Is that how this works?
Are responses always a string of them such that one follows what happens down the response tree like this?
Any input anyone cares to share with me would be appreciated.
Thanks.
Carlos