Preg_replace() Delimiter problem

Guys,

What I am trying to do is load google.com with cURL and no matter what link is on the page, be it a google link or another domain), I want the script to precede: http://mydomain.com/tracker.php?

So, if the page contains a link like this:

http://somedomain.com/

then that should be replaced with:

http://mydomain.com/tracker.php?http://somedomain.com/

And, if the page contains link like this:

http://subdomain.somedomain.com/dir/sub-dir/page.html

Then, I want it to be replaced to:

http://mydomain.com/tracker.php?http://subdomain.somedomain.com/dir/sub-dir/page.html

You get my point ?

Here’s my code but I get error:

Warning: preg_replace(): Delimiter must not be alphanumeric or backslash in C:\xampp\htdocs\test\CURL_experiment1.php on line 13

Here’s my code:

<?php
$url = "http://google.com";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "$url");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_HEADER, 5);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$result = curl_exec($ch);
curl_close($ch);

$pattern = 'http://';
$replacement = 'http://mydomain.com/tracker.php?';
echo preg_replace($pattern, $replacement, $result);

?>

Maybe actually using a delimiter would be a good idea?
http://php.net/manual/en/regexp.reference.delimiters.php

1 Like

Also, in a case where there is no regex, just string for string, you could simply user str_replace
http://php.net/manual/en/function.str-replace.php

3 Likes

Folks,

Here’s a contribution.


<?php
$url = "http://ebay.com";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "$url");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_HEADER, 5);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$result = curl_exec($ch);
curl_close($ch);

/* patter can be either 2:
$pattern = '#http://#';
$pattern = '/http:\/\//';
*/

$pattern = '/http:\/\//';
$replacement = 'http://mydomain.com/tracker.php?';
echo preg_replace($pattern, $replacement, $result);

?>

With the above code, I managed to pull-up a page and then add tracking links on all links present on the page (like you do with proxified pages).

Try it on google and it won’t work. :frowning:

What does the following header mean (ebay.com):

HTTP/1.1 200 OK X-EBAY-C-REQUEST-ID: ri=N79YSYGvO9se,rci=qZJNjVUUeUF9xCaS RlogId: t6e%60cckjkb9%3Fvo%7Bccbgmijf%28vo%7B%287570577-15c59d3449c-0x129 X-Frame-Options: SAMEORIGIN X-Frame-Options: SAMEORIGIN X-Content-Type-Options: nosniff X-XSS-Protection: 1; mode=block Cache-Control: private Pragma: no-cache Content-Type: text/html;charset=utf-8 Content-Language: en-US Server: ebay server X-EdgeConnect-MidMile-RTT: 172 X-EdgeConnect-Origin-MEX-Latency: 261 X-EdgeConnect-Cache-Status: 0 Date: Tue, 30 May 2017 14:47:17 GMT Transfer-Encoding: chunked Connection: keep-alive Connection: Transfer-Encoding Set-Cookie: JSESSIONID=B9699C4784AF48E8C158816A1094F78B; Path=/; HttpOnly Set-Cookie: ebay=%5Esbf%3D%23%5E;Domain=.ebay.com;Path=/ Set-Cookie: dp1=bu1p/QEBfX0BAX19AQA5b0eb975^bl/BD5cefecf5^;Domain=.ebay.com;Expires=Thu, 30-May-2019 14:47:17 GMT;Path=/ Set-Cookie: s=CgAD4ACBZLtd1NTlkMzQ0OGMxNWMwYTg4OGQ4YzYzYTYzZmZmOGJhNjhjlJyS;Domain=.ebay.com;Path=/; HttpOnly Set-Cookie: nonsession=CgADLAAFZLYz9MQDKACBik4d1NTlkMzQ0OGMxNWMwYTg4OGQ4YzYzYTYzZmZmOGJhNjgpg9Wr;Domain=.ebay.com;Expires=Wed, 30-May-2018 14:47:17 GMT;Path=/**

I have a feeling the header states that the visitor is the same visitor and so is this the case why I’m unable to precede the following on all links present on ebay homepage ?
(NOTE: I’m getting cURL to fetch ebay homepage).

http://mydomain.com/tracker.php?

Check the following code. Run it on your xamp/wamp and then hover your mouse over the links present on ebay’s homepage. Do you see all links starting with:

http://mydomain.com/tracker.php?

Yes, or no ? i used to see it. And so a YES from here from lastnight.
But tonight, the answer is a NO.
Now, I’m confused why is that ? Anything wrong with my coding ?

<?php
$url = "http://www.ebay.com";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "$url");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_HEADER, 5);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$result = curl_exec($ch);
curl_close($ch);

/* pattern can be either 2:
$pattern = '#http://#';
$pattern = '/http:\/\//';
*/

$pattern = '/http:\/\//';
$replacement = 'http://mydomain.com/tracker.php?';

$pattern = '/https:\/\//';
$replacement = 'http://mydomain.com/tracker.php?';

$pattern = '/localhost\//';
$replacement = 'http://mydomain.com/tracker.php?';

echo str_replace($pattern, $replacement, $result);

?>

What I’m doing here is just replacing the ā€œhttp://ā€ with:

http://mydomain.com/tracker.php?

But instead of using the replace function, is there a better way to precede the following on all links found on ebay homepage ? (The technique should work on any homepage and not be dependant on ebay homepage only).

Shouldn’t the patterns be an array and replacements be a string instead of overwriting the variable values?

http://php.net/manual/en/function.str-replace.php

I checked that link but a little confusing to me as I’m still learning the array section.
There is a thread of mine opened somewhere regarding the array issues. See if you can find it. Would appreciate any replies there.
In the mean-while, care to show an example how it should have been ? You can start-off by copying my code (in my previous post) and then making changes (correcting it). And then, paste your correction in your next post. That would be a good sample for all newbies. And maybe, even intermediate folks.

Thanks!

You know how to create an array. you know how to pass a variables (which may be arrays) into a function.
It would be a far more valuable and rewarding learning experience for you to put these things together yourself, as all you need is there before you.

You will not learn anything and never achieve your goals until you can learn to think for yourself and do for yourself.
If you get it wrong, that does not matter, errors can be found and it can be corrected. But if you are not prepared to put in any effort on your own projects, why would you expect anyone else to?

3 Likes

SamA74,

I’ll put words in your mouth:
Nice Try UI. :wink:

Here’s a code sample you asked for. Let me know what you think.


<?php
$url = "http://google.com";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "$url");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_HEADER, 5);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$result = curl_exec($ch);

echo preg_replace_callback( 
  '#https?://[^\s\'"]+#', 
  function($matches) { 
    return 'http://mydomain.com/tracker.php?'.urlencode($matches[0]); 
  }, 
  $result 
);  

?>

The above regex can precede the tracker link onto all links (absolute and relative links) found on the cURL fetched page.
So, an absolute link that originally looks like this:

https%3A%2F%2Fplus.google.com%2F116899029375914044550

is now shown as this by cURL:

http://mydomain.com/tracker.php?https%3A%2F%2Fplus.google.com%2F116899029375914044550

So far, so good.
But an absolute link that looks like this:

/intl/en/ads/

is getting shown by cURL like this:
http://localhost/intl/en/ads/

So the regex is preceding the tracker url on all links present on the cURL fetched page alright but it is not replacing the ā€œlocalhostā€ part.

Now, I need to add another line of code to replace the ā€œlocalhostā€ on relative links with the current page’s domain name. That is 2 sets of code. Would be better, if one regex did both tasks. And that is, precede the tracker url onto relative links by replacing the ā€œlocalhostā€. Is this possible ?

Anyway, an idea has popped into my mind.
How-about you guys (seniors) with fair experience (and when you have the time or when you’re feeling bored and need to do something interesting to pass time rather than go to the bar, eg. like weekends), build code puzzles (in a ā€œpuzzle solving threadā€) from time to time and see who can solve it.
This is where you write the codes for a specific task and then leave a few gaps for newbies to fill-in. As a sort of a test. Would be a great learning exercise. I reckon even intermediate and pro people would join in. :wink:
I might aswell contribute a little and build some of my own for my juniors to play with.

You know. No-one really knows it all. Even the intermediate and adv guys might find-out that they’ve learnt or understood something wrong and learn a little. Just one way for their errors to be caught. Isn’t there some sort of a saying that, the teacher becomes more expert when he starts teaching as he learns from his own teaching or from the feedback he gets from his students. Or, whatever.
I think it is a good idea. As I learn along, I can build puzzles after I finish each chapter. New newbies can do the same. Oldbies can come and check now and then how the newbies are fairing. I’m gonna have this topic at the back of my head (subconscious) and try building on the idea. The puzzle ghost has entered my head now and it ain’t going anytime soon.
Anyway, back to the topic.

I am not certain, but I suspect that if you’re running your code on XAMP or WAMP or whatever on your own PC, it’s something in there that is adding the ā€œlocalhostā€ part onto the beginning. When the site you’re scraping delivers a relative link, it almost certainly won’t contain ā€œlocalhostā€ as most users won’t have a server running. So when you host your code on a proper server, I’d guess that it will precede the relative links (those that don’t start with a domain name) with whatever domain you’re running it from.

1 Like

You are absolutely right!
I used my above mentioned code on my site and I don’t see any localhost mentioned but my domain name instead in the relative links. But still not good as the tracker link is not getting preceded onto the relative link. Only gets preceded onto the absolute link.
Therefore, have to find a regex that will precede on both types of links.

Probably because your regex contains ā€œhttpā€ and relative links do not. But I steer clear of regular expressions so have no suggestions on what might help.

Thanks man!

Ok, let’s forget about regex and how I’d do it. Let’s see some code examples from you how you’d do it. We all might learn a trick or two. What do you say ? :slight_smile:
Even though they say that you can’t teach an old dog a new trick. I say, you can always teach an old fella some new good tricks! :wink:

Fellow Php Programmers,

I’ve nearly sorted this problem out but just one little hiccup left.
Here’s my latest code:


<?php

$conn = mysqli_connect("localhost", "root", "", "id");

if(isset($_GET["url_to_proxify"]) === TRUE)
   {
	$url_to_proxify = trim(mysqli_real_escape_string($conn,$_GET["url_to_proxify"]));
    echo "Url to proxify = $url_to_proxify";
   }
?>

<html>
   <body>   
      <form action = "<?php $_PHP_SELF ?>" method = "GET">
         Url: <input type = "text" name = "url_to_proxify" />
              <input type = "submit" />
      </form>      
   </body>
</html>

<?php

$page = file($url_to_proxify);

foreach($page as $phrase)
{
	
//eg: $pattern = array("localhost", "./", "https://", "http://");


$phrase = preg_replace('/src="/', 'src="'.$url_to_proxify, $phrase);
$phrase = preg_replace('/action="/', 'action="'.proxy.php?url_to_proxify**=**$url_to_proxify, $phrase);

echo $phrase;

}

?>

I get this error:

Parse error: syntax error, unexpected ā€˜=’ in C:\xampp\htdocs\id\proxy.php on line 32

The error is saying that, in line 32 (2nd last line of the script) should not have an equal sign. But I say it must because it is part of the GET Method.
Here’s the concerned line:


$phrase = preg_replace('/action="/', 'action="'.proxy.php?url_to_proxify**=**$url_to_proxify, $phrase);

If you take-out the ā€œ=ā€ from the url in line 32 then the whole script works. Check it out:


<?php

$page = file($url_to_proxify);

foreach($page as $phrase)
{
	
//eg: $pattern = array("localhost", "./", "https://", "http://");


$phrase = preg_replace('/src="/', 'src="'.$url_to_proxify, $phrase);
$phrase = preg_replace('/action="/', 'action="'.proxy.php?url_to_proxify=$url_to_proxify, $phrase);

echo $phrase;

}

?>

But like I said, I need the ā€œ=ā€ on the url as it is part of the GET method.
And so, how do I sort this problem out ?

Since there isn’t actually any real regex in the search pattern, why are you even using preg_replace() and not the simpler str_replace() function as I advised you almost a month ago?

1 Like

You’re back to not quoting strings properly, exactly the same problem that you ran into in your post about why fopen wouldn’t work if you didn’t quote the filename properly.

SamA74,

If you check closely, I did actually listen to your suggestion and quit the preg replace as my code wasn’t working. I originally got that code from a youtube tutorial that showed to use ereg. But I guess it is an old video. Didn’t notice. hence, when ereg did not work, I switched to preg_replace and encountered a problem to where you replied to just use the string_replace instead. I did use it and encountered a problem and got stuck and wasn’t getting any help from anyone both in this thread and other forums:

I think they closed that thread after merging my question onto this thread instead:

You will notice on both threads I use string_replace like you suggested. And so, I am not ignoring your suggestion.

Yes, this thread and the above 2 threads are related. You can pretty much say part of the same code but preg and string_replace are 2 different functions. Hence the 2 different threads. This one and that one mentioned just above.
And so lastnight switched back to the youtube video (since I was facing str_replace issues and not getting any solution any longer) and took another’s code suggestion on the preg from another forum and it worked at first but then it didn’t for some reason (maybe, I added a bit of code and that obstructed some-how). And so, I was googling for a solution and came across a thread that was using the exact same code from that same youtube tutorial and a solution was there and so I just copied the code and renamed the variables here and there after adding a bit more code to my tailor. The original code is here that comes from the youtube tutorial:

https://stackoverflow.com/questions/22255241/preg-replace-no-ending-matching-delimiter-gt/22255455#22255455

Youtube tutorial here:

https://www.youtube.com/watch?v=P49w0E64MAA

Worked for a while for me there but when I added some more codes to tailor to my needs it then stopped working. Hence, I made the previous post. Frankly, when I first saw that stackoverflow thread, I thought it was mine. But then noticed the date. 3 yrs ago! Definitely, not my thread! Stackoverflow makes it hard to see who the original poster is. I don’t see any Username of the poster mentioned. Nevermind.

Anyway, I’m closing this thread or getting it closed. Will continue this thread’s code on the other thread.

Do you mind pointing-out to where you are referring to ?
If you mean this line:

$phrase = preg_replace('/action=**"**/', 'action="**'**.proxy.php?url_to_proxify=$url_to_proxify, $phrase);

Then there is a reason for the double quotes. I’m picking the:

action="

that described the web form action.
You can find more about this on this 5 or so mins youtube tutorial video where the guy makes that part of the code work:

  1. As explained to you very recently, we do not close threads. If you stop posting in it, it will eventually close itself. (Announcing you are closing it and then asking another question makes no sense at all.)

  2. If you want to discuss this thread’s code, then kindly do so in this thread. Forum policy on cross-posting and related issues has been explained to you several times. If you are still in any doubt about it, then please check the FAQs, in particular the ā€œKeep it tidyā€ section.

2 Likes

In this line:

$phrase = preg_replace('/action="/', 'action="'.proxy.php?url_to_proxify=$url_to_proxify, $phrase);
                       ^          ^  ^        ^~~~~~~~~~~~~~~~~~~~~~~~~~~
                       O          C  O        C 

Where O = open quotes, and C = close quotes. Outside of the quoted areas you need to either use variable names, or constants. All the bits with ~ under them are neither quoted strings nor variables, and will give you and error. I think the line should be more like

$phrase = preg_replace('/action="/', 'action="proxy.php?url_to_proxify=' .$url_to_proxify, $phrase);

but I have never used preg_replace() so I might be off base. And for what you’re doing I’d stick with str_replace myself.

2 Likes