Why 400 Error?

uniqueideaman · June 16, 2017, 12:16am

Folks,

Why do I keep on getting this error when everytime I input a url in the url field in the following web proxy:

“The specified URL could not be returned due to a status code of 400.”


<?php
	error_reporting(0);

	session_start();

        //Settings Instructions: https://darkpolitics.wordpress.com/2009/12/29/create-your-own-web-proxy-server/

	// turn debug messages on when debugging your proxy
	//$DEBUG = true;
	$DEBUG = false;

	// set this to the location of the webproxy page if you know where its going to be otherwise this function will work it out.
	// for performance you should hardcode this to your webproxy location
	//$PROXYURL = "http://www.mysite.com/myproxy.php";
	$PROXYURL = get_current_location(); // works out current scripts location

	// urls from orig search will be $_POST but then future links we proxify will be $_GET
	$url = $_REQUEST["url"];
	$useragent = $_POST["useragent"]; // will only be a POST from search form

	ShowDebug("useragent posted from search form = $useragent");
		
	// set the user-agent we will surf with. We only set on initial search and then use a session to pass this var to any
	// other content passed through the proxy. Make sure you have session cookies enabled for your proxy page!
	if(!empty($useragent)){
		if($useragent=="us"){
			$surf_useragent  = $_SERVER["HTTP_USER_AGENT"]; // use current agent
		}else if($useragent=="ie"){
			$surf_useragent = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)";		// use IE 7
		}else{ // must be ff as we only have 2 choices!! Add as required
			$surf_useragent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6 (.NET CLR 3.5.30729)"; // use FF3
		}
		// set a session for future calls through the proxy
		$_SESSION["surf_useragent"] = $surf_useragent;
	}else{
		$surf_useragent = $_SESSION["surf_useragent"];
	}

	ShowDebug("surf with agent = $surf_useragent");

	$err = false;
	$msg = "";
	$content = "";
	$subpathurl ="";
	$pathurl = "";
	$siteurl = "";

	// this list contains domains that this proxy will allow obviously in your own proxy you can remove this!!
	$whitelist = "technicallypolitical.com,strictly-software.com,infowars.com,prisonplanet.com,hashemian.com";
	$cansearch = false;

	ShowDebug("url = $url");
	ShowDebug("useragent = $useragent");
	ShowDebug("PROXYURL = $PROXYURL");

	if(!empty($url)){
				
		ShowDebug("url = $url");

		// make sure its valid with a protocol at the start
		if($url == "http://"){
			$err = true;
			$msg = "Please specify a full URL to access e.g http://www.darkpolitricks.com";
		}else if(!preg_match("/https?:\/\//",$url)){
			$err = true;
			$msg = "Please specify the protocol within the URL e.g http://";

			ShowDebug("error = $msg");
		}else{
			
			ShowDebug("get content from remote url $url");

			if(!empty($whitelist)){
				// check whether url is allowed
				$allowed = explode(",",$whitelist);
				$count = count($allowed); 
				$lowurl = strtolower($url);

				ShowDebug("check whether $lowurl is in whitelist of $whitelist");

				foreach($allowed as $val){
					ShowDebug("check whether ".$val." is in $url");

					if( strripos($lowurl, $val) !== false){
						ShowDebug("This url $url is on whitelist matching $val");
						$cansearch = true;
						break;
					}
					
				}
			}else{
				$cansearch = true;
			}

			if(!$cansearch){
				$err = true;
				$msg = "The url is not allowed to be accessed from this web proxy server.";
			}else{
		
				// crawl item e.g URL, script, CSS, image
				$html = mycrawler_single($url,$surf_useragent);

				$content = $html["html"];
				$status = $html["status"];
				$headers = $html["header"];
				$content_type = $html["content_type"];
				$connect_error = $html["message"];

				ShowDebug("connect error = $connect_error");
				ShowDebug("status = $status");

				// a status code 200 means we got a successful request back if we didn't then we have an issue
				if($status!="200"){
					
					// 404 = Page not found
					if($status=="404"){
						$err = true;
						$msg = "The specified URL could not be located.";
					}else if(!empty($connect_error)){
						$err = true;
						$msg = $connect_error;

						ShowDebug("CONNECT ERROR = $connect_error; msg = $msg");
					}else{
						$err = true;
						$msg = "The specified URL could not be returned due to a status code of $status.";
					}

				}else{

					// need to replace all links in our returned content with links to the proxy so that future clicks are proxified
					$urlinfo = parse_url($url);

					// get root url to exend any relative links e.g http://www.mysite.com
					$siteurl = $urlinfo["scheme"]."://".$urlinfo["host"];
					if(!empty($urlinfo["path"])){
						$pathurl = $siteurl.$urlinfo["path"]; 

						// make sure file is removed in case we need current sub directory
						$pospath = strripos($pathurl, "/");
						
						if($pospath!==false){

							ShowDebug( "take up to / as pos $pospath in $pathurl<br />");

							$subpathurl = substr($pathurl,0,$pospath);
						}else{
							$subpathurl = $pathurl."/";
						}
					}else{
						$pathurl = $siteurl;
						$subpathurl = $pathurl."/";
					}

					ShowDebug("SiteURL = $siteurl path = $pathurl");

					// for text related content we scan for links so that we can change them all to go through our proxy
					// for images and other non textual content we have no need to change the links
					if(preg_match("/(text|html|xml|xhtml|css|javascript)/i", $content_type )){
					//if(preg_match("/(text|html|xml|xhtml)/", $content_type )){
						
						ShowDebug("parse links");

						// make sure all links are rerouted through proxy
						$content = reformat_links($content,$siteurl,$subpathurl);

					}

					// As all links/src values from the page we visit need to pass through the proxy as well we need to ensure
					// to output the correct header for file. For example a PNG image needs to have the correct header e.g image/png

					ShowDebug("output content-type: $content_type");

					header( $content_type );

					ShowDebug("output content = $content");

					// output content to screen
					echo $content;
				}
			}
		}
	}else{
		// default url to http://
		$url = "http://";
	}

	// Will return the current location of the script running. If the proxy page is moved around a lot then this
	// will work out where it is but for performance set the value at the top in $PROXYURL
	function get_current_location(){

		$url = "";

		if( $_SERVER["SERVER_PORT"]== 443){
			$protocol = "https://";
		}else{
			$protocol = "http://";
		}

		$url = $protocol . $_SERVER["SERVER_NAME"] . $_SERVER["SCRIPT_NAME"];

		return $url;
	}


	// retrieve link destinations and modify them so that when they are clicked the content is passed through the proxy
	// as well. I look for src/href tags. Currently this does not handle URLs defined like so href="../"
	function reformat_links($content,$siteurl,$subpathurl){ 
		// need to make all URLs go through our proxy! use ISAPI rewriting to make it nicer this is just a guide
		global $PROXYURL;

		$relurl = $PROXYURL . "?url=" .$siteurl; // for urls like url="/sub/page.htm"
		$cururl = $PROXYURL . "?url=" .$subpathurl; // for urls like url="page.htm"
		$absurl = $PROXYURL . "?url=";  // for urls like url="http://www.mysite.com/page.htm"

		ShowDebug("reformat rel urls = $relurl");
		ShowDebug("reformat cur urls = $cururl");
		ShowDebug("reformat abs urls = $absurl");

		$newcontent = $content;

		// get all links and reformat
		// as we don't want to do the same links multiple times which happens I use placeholders first and then
		// once every possible location has been marked I insert the link to the proxy
	
		// look for absolute urls e.g url="http://www.mysite.com/blah.asp"
		$newcontent = preg_replace("/((?:href|src)=['\"])(http.*?)(['\"])/i","$1##ABSURL##$2$3",$newcontent);

		// get links starting with / e.g url="/sub/page.htm"
		$newcontent = preg_replace("/((?:href|src)=['\"])(\/.*?)(['\"])/i","$1##RELURL##$2$3",$newcontent);

		// get links starting like url="page.htm"
		$newcontent = preg_replace("/((?:href|src)=['\"])([^#h\/][^#t][^t][^p].*?)(['\"])/i","$1##CURURL##$2$3",$newcontent);
		
		// now replace placeholders 
		$newcontent = str_replace("##RELURL##",$relurl,$newcontent);	
		
		$newcontent = str_replace("##CURURL##",$cururl,$newcontent);	
		
		$newcontent = str_replace("##ABSURL##",$absurl,$newcontent);				

		ShowDebug("return content");

		return $newcontent; 
	} 

	
	// code to load remote content such as HTML files, CSS, Images etc
	// To follow more than 3 redirects (e.g ISAPI rewrites then change $maxredirs=XX)
	function mycrawler_single($url, $useragent="",$timeout=10, $maxredirs=3) 
	{
		ShowDebug( "IN mycrawler_single Get URL content from $url $useragent maxredirs = $maxredirs");
		
		$urlinfo = parse_url($url);
					 
		if (empty($urlinfo["scheme"])) {$urlinfo = parse_url("http://".$url);}                                                                  
		if (empty($urlinfo["path"])) {$urlinfo["path"]="/";}
				  
		if (empty($urlinfo["port"]))
		{
				switch($urlinfo["scheme"])
				{
					case "http":
						$urlinfo["port"] = 80;
						break;  
					case "https":
						$urlinfo["port"] = 443;
						break;                
				}
		}

		// if no agent is supplied use default agent
		if (empty($useragent)) $useragent = $_SERVER["HTTP_USER_AGENT"];

		ShowDebug("useragent to use = $useragent");

		if (isset($urlinfo["query"]))
		{
			$request = "GET ".$urlinfo["path"]."?".$urlinfo["query"]." ";
		} else {   
			$request = "GET ".$urlinfo["path"]." ";
		}
		
		// form request
		$request .= "HTTP/1.0\r\n";
		$request .= "Host: ".$urlinfo["host"]."\r\n";
		$request .= "User-Agent: ".$useragent."\r\n";
		$request .= "Connection: close\r\n\r\n";
		
		ShowDebug( "request = ".$request);

		ShowDebug( "open ".$urlinfo["host"].":".$urlinfo["port"]);

		$fp = @fsockopen($urlinfo["host"], $urlinfo["port"], $errno, $errstr, $timeout);

		if (!$fp)
		{
			ShowDebug( "ERROR! (".$errno.")".$errstr);

			$urlinfo["header"] = "";
			$urlinfo["html"] = "Error: $errno $errstr"; 
			$urlinfo["status"] = 400; // bad request
			$urlinfo["content_type"] = "";
			$urlinfo["message"] = "The request could not be made. $errno $errstr";

			return $urlinfo;  
		}
		else
		{   
			ShowDebug($request);

			fwrite($fp, $request);
			
			while (!feof($fp)) 
			{
				if(isset($data)){
					$data .= fgets($fp, 4096);  					
				}else{
					$data = fgets($fp, 4096);
					ShowDebug( "take status code from 9,4 in data = ".$data);
					
					// status code should be here! if not its a bad request
					$code = trim(substr($data,9,4));					
					ShowDebug( "Status Code = ".$code);					
				}
			}
			
			ShowDebug( "Status Code = ".$code);	

			// if no status code default to 400 = Bad Request
			if(empty($code) || !is_numeric($code)){

				$code = 400;

				ShowDebug("default to bad request 400");
			}

			ShowDebug("status code = $code - response = $data");

			fclose($fp);   
						
			$tmp = explode("\r\n\r\n", $data, 2);
			
			// We will return an array with these parts header, html, status code and content-type
			$urlinfo["header"] = $tmp[0];
			$urlinfo["html"] = $tmp[1]; 
			$urlinfo["status"] = $code;
			$urlinfo["content_type"] = get_content_type($tmp[0]);
			$urlinfo["message"] = "";

			ShowDebug( "The Status Code = ".$urlinfo["status"]." from header: ".$urlinfo["header"]);
			
			// handle redirects
			ShowDebug( "do we need to redirect? pos of location in header = ". stripos($urlinfo["header"], "location:"). " maxredirs = $maxredirs");

			if ((stripos($urlinfo["header"], "location:")) && ($maxredirs > 0))
			{
				ShowDebug( "found location in header and we CAN REDIRECT");
				
				preg_match("/\r\nlocation:(.*)/i", $urlinfo["header"], $match);

				if ($match)
				{    
					$redirect = trim($match[1]);
					
					ShowDebug( "Redirecting to ".$redirect);
					ShowDebug( "$maxredirs is currently $maxredirs");

					$maxredirs--;                         
					
					ShowDebug( "$maxredirs after count down is now $maxredirs");

					ShowDebug( "DO A REDIRECT TO $redirect");

					return mycrawler_single($redirect, $useragent, $timeout, $maxredirs);
				}
			}       

			ShowDebug( "RETURN FROM mycrawler_single");

			// return array of header/html
			return $urlinfo;          
		}        
	}
	
	// will check headers for the content-type. We need this so that images are displayed correctly
	function get_content_type($headers){
		$content_type = "";

		if(!empty($headers)){
			$headerarray = explode("\r\n", $headers);
			foreach($headerarray as $head){
				
				ShowDebug( "header item = ".$head);

				if(preg_match("/Content-Type: .+$/i",$head)){
					$content_type = $head;
					break;
				}				
			}
		}

		ShowDebug("return $content_type");

		return $content_type;
	}

	// Debug function if you want to show debug e.g for testing your proxy then turn $DEBUG = True at top of page
	// for performance all ShowDebug statements should be removed on production to reduce unneccessary function calls
	function ShowDebug($msg){
		global $DEBUG;
		if(!$DEBUG) return;
		if(!empty($msg)){
			echo htmlentities($msg)."<br />";
		}
	}

if(empty($url) || $url=="http://" || $err){
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" dir="ltr" lang="en-US">
<head>
<title>Dark Politricks Web Proxy Example</title>
<meta content='text/html; charset=UTF-8' http-equiv='Content-Type'/>
<meta name="keywords" content="DarkPolitricks, WebProxy, Proxy, Proxies, Proxi, Proxied, Forwarded-For" /> 
<meta name="description" content="An example of a web proxy, how you can make your own web proxy to bypass basic filtering" /> 
<!-- Put all these in an external stylesheet -->
<style>
	body{background:lightblue;}
	p{font-weight:bold;}
	.error{color:red;}
	.msg{color:green;}
	#main{margin:auto;width:600px;}
	#search{margin:auto;width:600px;}
	label{font-weight:bold;font-face:Tahoma,Arial;}
	#url{width:300px;}
	#searchflds{border:1px solid black;}
	dt{float:left;}
	dd{float:left;}
	#domainlist{font-style:italic;color:navy;}
	#searchbutton{text-align:right;}
	#agent{clear:both;}
	.agent{margin-top:10px;}
	#ie{margin-left:-12px;}
</style>
</head>
<body>

	<div id="main">
		<h1>Example of a WebProxy</h1>

		<?php
		if(!empty($msg)){
			if($err){
				echo "<p class='error'>$msg</p>";
			}else{
				echo "<p class='msg'>$msg</p>";
			}
		}
		?>

		<p>This is an example page and can only be used to access the following domains:</p>
		<p id="domainlist">technicallypolitical.com, strictly-software.com, infowars.com, prisonplanet.com</p>

		<p>Please read the related article at <a href="http://www.darkpolitricks.com/2009/12/create-your-own-web-proxy-server" title="Create your own web proxy">www.darkpolitricks.com</a> to get more information as well as a link to download the code so that you can create your own web proxy.</p>

		<div id="search">
			<form id="searchanon" name="searchanon" method="POST">
				<fieldset id="searchflds">
					<dl>
						<dt><label for="where">Where To</label></dt>
						<dd><input type="text" id="url" name="url" value="<?php echo $url ?>" maxlength="100" />
					</dl>
					<dl id="agent">
						<dt class="agent"><label for="useragent">User-Agent</label></dt>
						<dd class="agent"><input type="radio" name="useragent" id="ie" value="ie" <?php if($useragent=="ie"){ echo 'checked="true"'; } ?> /><label for="ie" title="Use IE 7 user-agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)">IE 7</label>
							<input type="radio" name="useragent" id="ff" value="ff" <?php if($useragent=="ff"){ echo 'checked="true"'; } ?> /><label for="ff" title="Use FireFox 3 user-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6 (.NET CLR 3.5.30729)">FireFox 3</label>
							<input type="radio" name="useragent" id="us" value="us" <?php if($useragent=="us"){ echo 'checked="true"'; } ?> /><label for="ff" title="Keep existing agent: <?php echo $_SERVER["HTTP_USER_AGENT"] ?>">Keep Existing User-Agent</label>
							</dd>
					</dl>
				</fieldset>
				<p id="searchbutton"><input type="submit" value="Go There" id="submitsearch" name="submitsearch" />
			</form>
		</div>
	</div>
</body>
</html>
<?php
}
?>

And, how do I remove the restrictions so any website can be viewed apart from:


$whitelist = "technicallypolitical.com,strictly-software.com,infowars.com,prisonplanet.com,hashemian.com";

I removed the above mentioned urls from the $whitelist and it worked as I was able to view google but then the 400 error started appearing.
How would you change the code and where ?

droopsnoot · June 16, 2017, 9:40am

Is the 400 error coming from the remote server, or is it the 400 error generated inside the mycrawler_single() function definition? I am no expert on HTTP response codes, but as I read it a 400 error suggests a badly-formed request, but that code returns a 400 error for any problem in opening a socket to the remote server.

I would think that to debug it, first you’ll have to find out which you are getting. Easy way would be to change the code in that function to return a nonsensical status, run it again and see if the error code changes.

So you want to change that to be a blacklist instead of a whitelist? If that’s the case, just change the flag so that it defaults to true unless the URL is in that string, in which case it is set to false. And change the variable name to $blacklist as well.

uniqueideaman · June 16, 2017, 12:37pm

Thanks for the reply Droopsnoot!

As far as I understood, the websites in the $whitelist can only be viewed in that proxy and none other. No, I don’t want to turn the websites listed in the $whitelist into $blacklist. I just want to remove the restrictions so any websites can be viewed.

As far as I understand, that 400 error is coming from the proxy server. I reckon it detects the webpage is not being presented by the web server and it checks what error the web server shows and then it shows us it’s own custom error message. In this case:

The specified URL could not be returned due to a status code of 400.

The concerned lines are 112 to 127:


				// a status code 200 means we got a successful request back if we didn't then we have an issue
				if($status!="200"){
					
					// 404 = Page not found
					if($status=="404"){
						$err = true;
						$msg = "The specified URL could not be located.";
					}else if(!empty($connect_error)){
						$err = true;
						$msg = $connect_error;

						ShowDebug("CONNECT ERROR = $connect_error; msg = $msg");
					}else{
						$err = true;
						$msg = "The specified URL could not be returned due to a status code of $status.";
					}

It says unless the Status Code is 200 then to show errors. I guess Status 200 means the web server managed to serve the page without any problems.

SamA74 · June 16, 2017, 12:45pm

#400 Bad Request
The server cannot or will not process the request due to something that is perceived to be a client error (e.g., malformed request syntax, invalid request message framing, or deceptive request routing).
https://httpstatuses.com/400

Any of those example sound true?

uniqueideaman · June 16, 2017, 1:21pm

SamA74,

Isn’t this a general error ? Would’ve been better if the error was specific to where exactly the user is making the bad request. The url format is valid and the requested webpages exist and so I don’t understand why the requests are being turned into “bad”. Do you understand it ?

How would you solve this issue ? What is the solution ?

droopsnoot · June 16, 2017, 5:29pm

OK, it was where you said “any web sites can be viewed apart from”, which suggests that those in the list should not be accessible. If you just want access to any site, just blank out the string, and this bit will kick in:

if(!empty($whitelist)){

.. // but it is empty, so the else will work:

}else{
			$cansearch = true;

Well, that’s something I think you need to check. You show the part where the error from the connection is displayed, but if you look in the function that I named above, you’ll see this section of code

	$fp = @fsockopen($urlinfo["host"], $urlinfo["port"], $errno, $errstr, $timeout);

	if (!$fp)
	{
		ShowDebug( "ERROR! (".$errno.")".$errstr);

		$urlinfo["header"] = "";
		$urlinfo["html"] = "Error: $errno $errstr"; 
		$urlinfo["status"] = 400; // bad request
		$urlinfo["content_type"] = "";
		$urlinfo["message"] = "The request could not be made. $errno $errstr";

So, if it cannot connect the socket, it will return a 400 error. That’s what I mean by changing this to a different value, so you can see whether it is this section of code that is returning the 400 error, or whether you are getting it from the remote site.

uniqueideaman · June 16, 2017, 11:11pm

I changed the code to “4000” and I still get the error 400 and so I guess this part of the code is not triggering the error.


if (!$fp)
		{
			ShowDebug( "ERROR! (".$errno.")".$errstr);

			$urlinfo["header"] = "";
			$urlinfo["html"] = "Error: $errno $errstr"; 
			$urlinfo["status"] = 4000; // bad request
			$urlinfo["content_type"] = "";
			$urlinfo["message"] = "The request could not be made. $errno $errstr";

			return $urlinfo;  
		}
		else

droopsnoot · June 17, 2017, 5:39pm

Yes, that’s what I meant - now you know it isn’t being triggered by that part so you can concentrate on the other bits to narrow it down.

Mittineague · June 17, 2017, 7:46pm

I have an aversion to using the @ error suppressor,
Maybe if you temporarily remove that from the fsockopen call you will see something helpful?

system · September 18, 2017, 9:35am

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.