SitePoint Sponsor

User Tag List

Results 1 to 4 of 4
  1. #1
    SitePoint Evangelist
    Join Date
    Oct 2000
    Posts
    430
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    PHP and Search Engine Query

    Hi,

    I was just wondering if there was any way you can use PHP to detect if the file being served is either to a search engine/spider or to a regular browser. I'm looking for something that could differentiate between the two.

    If so could anyone explain how.

    Many thanks.

  2. #2
    Grumpy Mole Man Skunk's Avatar
    Join Date
    Jan 2001
    Location
    Lawrence, Kansas
    Posts
    2,067
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    You can do that by checking the USER AGENT tring of the browser - have a look at the $HTTP_USER_AGENT variable. I don't know what strings the different search engine robots uyse though (I'd imagine there's a site somewhere that has a list of their user agents). If you're thinking of serving up different content to search engine robots in an attempt to improve your search engine ratings think again however - most search engines check your page twice once with a "fake" user agent and once with the standard one in order to catch people who are trying to cheat the spider in that way. If they find major differences they will rate your site lower or even ignore it completely (*)

    * This may or may not be true, I remember reading it somewhere but I can't remember where

  3. #3
    SitePoint Evangelist
    Join Date
    Oct 2000
    Posts
    430
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    If you're thinking of serving up different content to search engine robots in an attempt to improve your search engine ratings think again however - most search engines check your page twice once with a "fake" user agent and once with the standard one in order to catch people who are trying to cheat the spider in that way. If they find major differences they will rate your site lower or even ignore it completely (*)

    * This may or may not be true, I remember reading it somewhere but I can't remember where
    The idea was to serve more search engine friendly pages - not much difference, but possibly improved keywords etc.

    Thinking on your comments - I can understand why search engines may try to do this. However I doubt they do, as it would cause untold problems for innocent sites.

    For example, on every page of my site I have 'text affiliate links' at the top of every page (probably around 35 words). These are generated at random every time a page loads from a list of around 50 affiliates I have text for. Therefore if the above where true then my site should already appear to be picked out as one which serves 'different content' to two seperate 'page calls'. I've so far had no problem with this and rank very high on google for numerous keywords.

    If you think of the number of 'high end' sites which generate random 'article headlines' or fast changing articles etc etc then these should also 'alert the search engines'. Take for example the top news site etc - would the search engines ban these because they delivered regular content etc.

    I'd be interested if anyone else could throw any light on it.

  4. #4
    Grumpy Mole Man Skunk's Avatar
    Join Date
    Jan 2001
    Location
    Lawrence, Kansas
    Posts
    2,067
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Yeah I'd thought about the innocent-sites-using-random-content angle. Presumably if search engine do double check sites in the way I mentioned they'll only ignore sites where the page is completely different, probably using some really clever document comparison algorithms or something.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •