Internal error on target site using curl but not browser


#1

Hello

(my username used to be johnyboy but it wasn't possible for me to log in with that anymore, so have created a new account…)

I'm using curl to access programme pages on BBC's site (to extract track listings) -- have been doing this for a while and all's been well. Now, I get an internal error on their site when using curl but not when using my browser. I don't know if this is a programming/server set up mistake on their part or if it's a slightly surreptitious way of blocking curl requests.

I'm wondering if anyone with curl installed in their php set up could run this code to see if it gives the same or different result as I get please? I'm trying to find out if it's something to do with my server.

<?php
function http_get($target, $ref) {
	$ch = curl_init();
	curl_setopt($ch, CURLOPT_URL, $target);
	curl_setopt($ch, CURLOPT_REFERER, $ref);
	curl_setopt($ch, CURLOPT_VERBOSE, TRUE);
	curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
	curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
	curl_setopt($ch, CURLOPT_MAXREDIRS, 4);
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
	curl_setopt ($ch, CURLOPT_HTTPGET, TRUE); 
	curl_setopt ($ch, CURLOPT_POST, FALSE); 
	$return_array['FILE']   = curl_exec($ch); 
	$return_array['STATUS'] = curl_getinfo($ch);
	$return_array['ERROR']  = curl_error($ch);
  	curl_close($ch);
  	return $return_array;
}
$target = 'http://www.bbc.co.uk/programmes/b04kzqlf';
$ref = '';
$results = http_get($target, $ref);
echo '<textarea cols="100" rows="10">';
print_r($results);
echo '</textarea>';
?>

I now get:

Array
(
    [FILE] => <!DOCTYPE html> <html lang="en-GB" > <head> <!-- Barlesque 2.75.0 --> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <meta name="description" content="" /> <meta name="keywords" content="" />   <title> BBC - Programmes - Internal server error</title>        <meta nam …….

in particular the "Internal server error" in the title. So I'm wanting to find out if that's just me (my server) or not.

Thanks.


How can I get the source code from another site by the php code
Saved webpages unreadable
#2

looks like this remote site response about error.


#3

Are you sure you can do this on this website? Did you read their Terms of use or did you get a special permission to do this?


#4

I'm simply using curl to get the info off the website rather than using my browser to get the info off their website. I'm not then making that info public in anyway.


#5

Anyone?: Does that code, on a different server to mine, give the same result or not?

Using curl from my command line on my computer works. So I suspect it's something to do with the shared server (virtual names, ukservers) I'm using. Just trying to confirm that for sure. Don't have access to other servers, all the server I have access to are on virtual names/ukservers. Just thought anyone who has curl installed could pretty easily check to confirm one way or the other?

Thanks.


#6

Your best bet is to check their API or contact their Support.

Sorry, the server encountered a problem

Please try again later

[STATUS] => Array ( [url] => http://www.bbc.co.uk/programmes/b04kzqlf [content_type] => text/html; charset=utf-8 [http_code] => 500 [header_size] => 364 [request_size] => 82 [filetime] => -1 [ssl_verify_result] => 0 [redirect_count] => 0 [total_time] => 4.882 [namelookup_time] => 0.031 [connect_time] => 0.124 [pretransfer_time] => 0.124 [size_upload] => 0 [size_download] => 54048 [speed_download] => 11070 [speed_upload] => 0 [download_content_length] => 54048 [upload_content_length] => 0 [starttransfer_time] => 4.57 [redirect_time] => 0 [certinfo] => Array ( ) [primary_ip] => 212.58.244.67 [primary_port] => 80 .....  [redirect_url] => ) [ERROR] => )

#7

Maybe the URL has changed?

$targets = array
(
'http://localhost/',
'http://www.bbc.co.uk/programmes/',
'http://www.bbc.co.uk/programmes/b04kzqlf/',
'http://google.com/'
);
$target = $targets[3];
$ref = '';
$results = http_get($target, $ref);

echo '<pre class="clb w88 mga tal bg0">';
echo '<br class="clb" />';
print_r($results);
echo '</pre>';

/*
[STATUS] => Array
(
  [url] => http://www.bbc.co.uk/programmes/b04kzqlf
  [content_type] => text/html; charset=utf-8
  [http_code] => 500
  [header_size] => 774
  [request_size] => 165
  [filetime] => -1
  [ssl_verify_result] => 0
  [redirect_count] => 1
  [total_time] => 4.715843
  [namelookup_time] => 3.1E-5
  [connect_time] => 3.1E-5
  [pretransfer_time] => 7.1E-5
  [size_upload] => 0
  [size_download] => 79
  [speed_download] => 16
  [speed_upload] => 0
  [download_content_length] => 79
  [upload_content_length] => 0
  [starttransfer_time] => 3.504141
  [redirect_time] => 1.211658
  [redirect_url] => 
  [primary_ip] => 212.58.246.94
  [certinfo] => Array
      (
      )
  [primary_port] => 80
  [local_ip] => 192.168.1.33
  [local_port] => 58170
)

*/

/*
[STATUS] => Array
(
[url] => http://www.bbc.co.uk/programmes
[content_type] => text/html; charset=utf-8
[http_code] => 200
[header_size] => 967
[request_size] => 147
[filetime] => -1
[ssl_verify_result] => 0
[redirect_count] => 1
[total_time] => 9.035005
[namelookup_time] => 3.0E-5
[connect_time] => 3.0E-5
[pretransfer_time] => 7.1E-5
[size_upload] => 0
[size_download] => 65154
[speed_download] => 7211
[speed_upload] => 0
[download_content_length] => 65154
[upload_content_length] => 0
[starttransfer_time] => 0.240416
[redirect_time] => 6.878375
[redirect_url] =>
[primary_ip] => 212.58.246.90
[certinfo] => Array
(
)
[primary_port] => 80
[local_ip] => 192.168.1.33
[local_port] => 50194
)
*/


#8

Your best bet is to check their API or contact their Support.

I have already contacted their support and was told:

"I understand you’re reporting fault whilst accessing BBC Radio content via cURL.

We greatly appreciate your time and thank you for reporting the fault. Please be informed that the recommended option to access BBC Radio content is via browser or dedicated apps, so it is out of our remit to investigate the issue with cURL."

Brilliant.

Anyway, thanks for checking that Mittineague, much appreciated. I assume you're not using VirtualNames / UKServers servers? No, I'm sure not. So it appears some of BBC's site has a problem with PHP curl accesses for some reason.

(Funnily, and almost certainly totally irrelevantly, I just happened to visit my area's weather page on the BBC site and got a 500 internal server error - in my browser that is. Relaoded a few seconds later and no problem.)

Maybe the URL has changed?

No, I mean the url in question (among a whole load of others) is http://www.bbc.co.uk/programmes/b04kzqlf. In a browser that's fine, via PHP curl it ain't.

John_Betong, it's not clear to me if http://www.bbc.co.uk/programmes/b04kzqlf resulted in a 500 internal server error for you or not?

I can't work out, in my mind, whether it's just a cock up on BBC's part or if it's intentional, as I suggested in my first message: surreptitiously blocking curl requests. I suspect, and it's a pure guess, it's a cock up. Especially given the fact Mittineague got an error from a different server. If it'd been restricted to my server I think I'd have plumped for surreptitious blocking. Anyway, all conjecture.

Tis funny how using curl on my command line on my computer has no problems.

I think I'm going to see if I can see what headers are being sent out by PHP's curl and see if tweaking those in some way can get a different result.

Thanks.


#9

I have nicked, modified and uploaded your script to the following:

www.johns-jokes.com/downloads/sp-d/johnyboy-curl-test/

The URI may be changed changed manually instead of selecting one of the preset links.


#10

Hi John, clicking the BBC Problem link on your page (which is the http://www.bbc.co.uk/programmes/b04kzqlf page) gives the header info with an http code 500, which is internal server error, but then beneath where you output the results it isn't a server error, it's the actual page. So I'm not sure what's going on there. The headers and the page output aren't tallying it seems. Weird.


#11

The response was taken from the curl script but the page was taken from file_get () or is it get_file_contents(). The response code is only a number. Search for http_response_code().
.
Think of a simple 404.php page that returns a 404 response_code().

I think you would be better scraping the page using file_get_contents() then parsing the page to obtain your data.

Tapped laboriously from a tablet.


#12

I'm seeing the same issue when using your example code. I believe that it is due to not specifying a valid referer in your request. Specifying a URL worked on my system.

$target = 'http://www.bbc.co.uk/programmes/b04kzqlf';
$ref = 'http://www.google.com'; // Any valid URL should work.
$results = http_get($target, $ref);

If the referer is not needed you can drop it completely.

function http_get($target) {
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $target);
    curl_setopt($ch, CURLOPT_VERBOSE, TRUE);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
    curl_setopt($ch, CURLOPT_MAXREDIRS, 4);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
    curl_setopt ($ch, CURLOPT_HTTPGET, TRUE); 
    curl_setopt ($ch, CURLOPT_POST, FALSE); 
    $return_array['FILE']   = curl_exec($ch); 
    $return_array['STATUS'] = curl_getinfo($ch);
    $return_array['ERROR']  = curl_error($ch);
    curl_close($ch);
    return $return_array;
}
$target = 'http://www.bbc.co.uk/programmes/b04kzqlf';
$results = http_get($target);
echo '<textarea cols="100" rows="10">';
print_r($results);
echo '</textarea>';

#13

Welcome to the forum and many thanks for the update.

I have uploaded the changes and delighted to say that it does work smile

Source code of the include file is now shown on the home page.

http://www.johns-jokes.com/downloads/sp-d/johnyboy-curl-test/


#14

I completely forgot about this thread, sorry. Silly me because I'd have found out the answer to my question much sooner. Specifying a valid referring works. Thanks, belatedly, very much davidtsadler smile


#15

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.