SitePoint Sponsor

User Tag List

Results 1 to 3 of 3

Hybrid View

  1. #1
    SitePoint Guru bronze trophy
    Join Date
    Dec 2003
    Location
    Poland
    Posts
    930
    Mentioned
    7 Post(s)
    Tagged
    0 Thread(s)

    How does googlebot know URLs I'm visiting in my browser??

    Today I've been really struck at what I discovered and I can't find any explanation. I am working on an online shop and there is a mechanism which enables clients to confirm their email address by clicking a personalized link which they receive in an automatically generated message. Here is an example:

    http://www.gsm-support.net/en/confir...806e33e2193232

    Of course, the link above is invalid but it's not the point. When a client clicks the link then the PHP script on the server marks this order as confirmed in the database. While I was testing this I was startled by what I saw in the server logs. So this is what happens when I go to the confirmation URL in my browser:

    1. First page reguest is http://www.gsm-support.net/en/confir...806e33e2193232 - my IP has been logged, all is as expected

    2. A second later comes another request for the same page and it was not made by my browser but by googlebot! The IP is 66.249.71.59, host name crawl-66-249-71-59.googlebot.com, user agent: Mediapartners-Google.

    3. In the same second comes another same request from googlebot.

    The script is therefore invoked 3 times. How can this be explained? I tried copying and pasting the link to Seamonkey, Firefox, Opera and IE and the result was the same every time. The only difference was IE in that there was only 1 request from googlebot. It's not possible that googlebot could obtain the link from other sources because it is generated at the time of invoking the script. And of course, I want to make sure that the link is known only to the person receiving it whereas it turns out googlebot knows it immediately! Could it be that my computer has some malware and sends to google URLs of the pages I visit? Or could the server where the shop is hosted be hacked and it is sending the information?

    Also I observed that this happens only when I first visit the confirmation page. On subsequent visits only the request from my browser is logged. But when I make new order and new link is sent to me it happens again. Googlebot doesn't go to this URL first, it always goes there immediately after me.

    I don't know if it's important but the confirmation email is sent via PHP using SwiftMailer without using my own smtp - in other words SwiftMailer in effect uses the php mail() function.

    Any ideas?

  2. #2
    Follow Me On Twitter: @djg gold trophysilver trophybronze trophy Dan Grossman's Avatar
    Join Date
    Aug 2000
    Location
    Philadephia, PA
    Posts
    20,580
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    The Mediapartners crawler's job is to target the page for AdSense ads, and it's aware of the page because you have an AdSense ad on it.

    That's JavaScript which is communicating the URL back to Google. The JavaScript runs when someone views the page with it in a browser; in this case, you.

  3. #3
    SitePoint Guru bronze trophy
    Join Date
    Dec 2003
    Location
    Poland
    Posts
    930
    Mentioned
    7 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Dan Grossman View Post
    The Mediapartners crawler's job is to target the page for AdSense ads, and it's aware of the page because you have an AdSense ad on it.

    That's JavaScript which is communicating the URL back to Google. The JavaScript runs when someone views the page with it in a browser; in this case, you.
    You are right, there are adsense ads on this page. I haven't though of that and I wasn't aware that some URLs that I designed to be unique are actually sent to google because of that. Thanks!


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •