cURL Experiments

Php Colleagues,

I will tell you what I am upto. I am trying to have a web proxy track link clicks. That is the project.
I grabbed MiniProxy and added my own code so that every proxified link clicks get tracked by my tracker. It worked but it started logging not only the proxified page that loaded on the user’s screen but every link present on the page. That was a problem. For example, if the user is on ebay’s homepage then I expect the homepage url to be logged onto my mysql db tbl and not all the links (inc img links) present on the homepage. Came across this dead-end and for the time being quit trying to add a tracker on a third party script as trying to figure-out which line of the code does what was becoming a nightmare.
I thought I might aswell build my own basic web proxy and add the tracker on it. That way, if I encounter any problems then I’ll know which part of the code is malfunctioning.

Now look how far I got to …

I started off my coding with cURL. I got cURL to fetch ebay.com and precede a url from my own domain on all links present on the page. In this example: mymydomain.com/proxified_page.php?url_to_proxify=

So, if ebay.com has these 2 links, for example:

http://ebay.com/contactus.html
http://ebay.com/item_image.jpeg

then the cURL fetched ebay homepage (eg. proxified page) would list the links like so:

mymydomain.com/proxified_page.php?url_to_proxify=http://ebay.com/contactus.html
mymydomain.com/proxified_page.php?url_to_proxify=http://ebay.com/item_image.jpeg

That way, all the links present on the proxified page (cURL fetched page) would be links pointing to my domain or page that lists the cURL webpage fetching code. In other words, the links present would be proxified links. Later-on, I can add my tracker links too so all proxified page gets logged and tracked. Like so:

mymydomain.com/tracker/proxified_page.php?url_to_proxify=

In short, the whole script can be one page script. The same page would use cURL to fetch the page (ebay.com) and that same one page script would have the tracker/logger code at the top of the script. Wallah! I added a tracker to the basic proxy.
One experienced programmer (in another forum or maybe this one) said to me that, I cannot add my own tracker on proxified pages. In other words, I can put a link to a 3rd party site from my site and add a tracker so that I can track the click to the 3rd party domain page but once the visitor is on the 3rd party page or 3rd party domain then whatever links he finds and clicks on the 3rd party domain, I won’t be able to track. Technically, impossible. That was his claim. I argued, I would be able to track my visitor on 3rd party pages (page after page) because all pages would be proxified. And so, technically “Mission Possible”. He betted all his money currently then in his bank account. Offered me even his last few digits of his card, even though I told him I have already managed to add a tracking code onto MiniProxy. Shot himself in the foot. He did. Now, he’s not replying to me nor responding to my threads. He used to regularly. I guess he got spooked that, I’d win the bet and make claim to his money. I wasn’t really gonna do that. Instead, I was gonna ask him to help me out on where I get stick by building me a script with comments all over so I can learn from it and other newbies too as I was gonna release it as gpl. Anyway, we lost a good contributor. I hope he returns. He was a good contributor to my threads. One lesson to be learnt though is that: Never, ever make a bet. Especially, when someone says he has already managed to do what you say can’t be done.
I will disclose that MiniProxy script here one day after I have got rid of the hiccup on my tracker code. It is roaming in some other forum at the moment. Wait till it’s finished with the help of others. Some other programmers at other forums said the same thing as the betting person. That, I cannot track visitors on 3rd party sites but I have proved every experienced programmer wrong, who were on the other side. I, a newbie, amateur. Not bragging.

And so, here’s the code so far of my own basic web proxy with cURL. This is not the code where I added my tracker onto MiniProxy (3rd party web proxy script). This is my own web proxy to be. It is cURL experiment so far:


<?php

$conn = mysqli_connect("localhost", "root", "", "id");

if (!$conn) {
	// message to use in development to see errors
	die("Database error : " . mysqli_error($conn));

	// user friendly message
	// die("Database error.");
	exit();
}

if(isset($_GET["url_to_proxify"]) === TRUE)
   {
	$url_to_proxify = trim(mysqli_real_escape_string($conn,$_GET["url_to_proxify"]));
    echo "Url to proxify = $url_to_proxify";
   }
?>

<html>
   <body>   
      <form action = "<?php $_PHP_SELF ?>" method = "GET">
         Url: <input type = "text" name = "url_to_proxify" />
              <input type = "submit" />
      </form>      
   </body>
</html>

<?php
$url = $url_to_proxify;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "$url");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_HEADER, 5);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$result = curl_exec($ch);
curl_close($ch);

$phrase  = "$result";
//Below change 'localhost' to "./".
//eg: $pattern = array("./", "https://www.", "http://www.", "https://", "http://", "www.");
$pattern = array("localhost", "https://www.", "http://www.", "https://", "http://", "www.");
$replace   = array("proxified_page.php?url_to_proxify=", "proxified_page.php?url_to_proxify=", "proxified_page.php?url_to_proxify=", "proxified_page.php?url_to_proxify=", "proxified_page.php?url_to_proxify=", "proxified_page.php?url_to_proxify=");

$newphrase = str_replace($pattern, $replace, $phrase);

echo $newphrase;

?>

Note that, the script is no longer tied to ebay.com. I added a text box so you can type any url and the cURL would fetch that page and precede my domain links onto the links residing on the fetched page. Turn all links present onto proxified links. Test it and see. On some links the:

mymydomain.com/proxified_page.php?url_to_proxify=

does not get preceded.

Q1. Can you guess why that is ?

Anyway, use that script and fetch google homepage and then do a search on google. It won’t work. The google search result page (SERP) won’t be proxified. Google would show a dead link. In other words, my coding has failed to precede my domain’s url onto the links present on the proxified google page.
I don’t know how to solve this issue with cURL and so:

  1. Do you know how to solve it ? Any code samples most welcome.

Anyway, this is a tutorial video on php (not cURL) that solves this issue. But, it does not precede my domain’s url onto all links present on the proxified page.

https://www.youtube.com/watch?v=P49w0E64MAA


<?php

//Below code from Basic Php Proxy Video and fix from: https://stackoverflow.com/questions/22255241/preg-replace-no-ending-matching-delimiter-gt/22255455#22255455

$url = "http://www.google.com";
$page = file($url);

foreach($page as $part)
{
	
$part = preg_replace('/src="/', 'src="'.$url,$part);
$part = preg_replace('/action="/', 'action="'.$url,$part);

echo $part;

}

?>

So, as you can see. I got 2 sets of code.
The 1st one is cURL that fetches your chosen page and proxifies it. It proxifies all the links present on the proxified page. But, if you conduct a search on the proxified page (in our example make a search on proxified google homepage) then you see error as it fails to proxify google SERPs.
The 2nd script manages to search google and show google results without showing any error. But, it does not precede my domain url to the links present on the proxified page. That means, I cannot proxify the links present on the proxified page. Nor can I add my tracker onto these links.
To sort this issue out, I tried mixing & matching both scripts. And this is the best I came-up with and it is no good. Any chance you guys can do better ?

FIRST ATTEMPT (with cURL & Preg_Replace):


<?php

$conn = mysqli_connect("localhost", "root", "", "id");

if (!$conn) {
	// message to use in development to see errors
	die("Database error : " . mysqli_error($conn));

	// user friendly message
	// die("Database error.");
	exit();
}

if(isset($_GET["url_to_proxify"]) === TRUE)
   {
	$url_to_proxify = trim(mysqli_real_escape_string($conn,$_GET["url_to_proxify"]));
    echo "Url to proxify = $url_to_proxify";
   }
?>

<html>
   <body>   
      <form action = "<?php $_PHP_SELF ?>" method = "GET">
         Url: <input type = "text" name = "url_to_proxify" />
              <input type = "submit" />
      </form>      
   </body>
</html>

<?php
$url = $url_to_proxify;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "$url");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_HEADER, 5);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$result = curl_exec($ch);
curl_close($ch);

$phrase  = "$result";
//Below change 'localhost' to "./".
//eg: $pattern = array("./", "https://www.", "http://www.", "https://", "http://", "www.");
$pattern = array("localhost", "https://www.", "http://www.", "https://", "http://", "www.");
$replace   = array("proxified_page_2.php?url_to_proxify=", "proxified_page_2.php?url_to_proxify=", "proxified_page_2.php?url_to_proxify=", "proxified_page_2.php?url_to_proxify=", "proxified_page_2.php?url_to_proxify=", "proxified_page_2.php?url_to_proxify=");

$new_phrase = str_replace($pattern, $replace, $phrase);


//Below code from Basic Php Proxy Video and fix from: https://stackoverflow.com/questions/22255241/preg-replace-no-ending-matching-delimiter-gt/22255455#22255455

foreach($page as $new_phrase)
{
	
//eg: $pattern = array("localhost", "./", "https://", "http://");


$phrase = preg_replace('/src="/', 'src="'.$url_to_proxify, $new_phrase);
$phrase = preg_replace('/action="/', 'action="'.proxy.php?url_to_proxify=$url_to_proxify, $new_phrase);

echo $phrase;

?>

I get this error:

Parse error: syntax error, unexpected ‘=’ in C:\xampp\htdocs\id\proxy.php on line 32

SECOND ATTEMPT with php $page = file and Preg_Replace


<?php

$conn = mysqli_connect("localhost", "root", "", "id");

if(isset($_GET["url_to_proxify"]) === TRUE)
   {
	$url_to_proxify = trim(mysqli_real_escape_string($conn,$_GET["url_to_proxify"]));
    echo "Url to proxify = $url_to_proxify";
   }
?>

<html>
   <body>   
      <form action = "<?php $_PHP_SELF ?>" method = "GET">
         Url: <input type = "text" name = "url_to_proxify" />
              <input type = "submit" />
      </form>      
   </body>
</html>

<?php

$page = file($url_to_proxify);

foreach($page as $phrase)
{
	
//eg: $pattern = array("localhost", "./", "https://", "http://");


$phrase = preg_replace('/src="/', 'src="'.$url_to_proxify, $phrase);
$phrase = preg_replace('/action="/', 'action="'.proxy.php?url_to_proxify=$url_to_proxify, $phrase);

echo $phrase;

}

?>

I get same error as before:

Parse error: syntax error, unexpected ‘=’ in C:\xampp\htdocs\id\proxified_page_2.php on line 58

If I can get a proper reply to my post numbered 18 on the following thread then I guess I can make the final 2 codes (mentioned above) work.
https://www.sitepoint.com/community/t/preg-replace-delimiter-problem/264750/16

Q3. What do you think ?

Anyway, I will try another 2 attempts with the following when I get the time:

cURL & str_replace
Php with php $page = file and str_replace

You are welcome to demonstrate any code samples.

Thank You!