SitePoint Sponsor

User Tag List

Results 1 to 17 of 17
  1. #1
    SitePoint Addict amy.damnit's Avatar
    Join Date
    Sep 2009
    Posts
    336
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Prevent Page from being Indexed

    How do I prevent my website and all pages from being indexed by sites like Google, Yahoo, The Wayback Machine, etc.??

    One article I found recommends this...

    If you do not want your page indexed and would also like to prevent it from showing up in Google’s search results then using the meta robots tag is the most effective method of achieving both of these goals. Simply include the following meta robots tag within the head of your web page:

    Code:
        <meta name=”robots” content=”noindex”>
    And do not block the URL using robots.txt. Let the spiders crawl the page in question. This will prevent the URL from being indexed and prevent it from being shown in their SERPs.
    Currently, I have the following robots.txt file in my Web Root...

    Code:
    User-agent: *
    Disallow: /
    BTW, while technically all of my web pages are PHP files (i.e. they have a ".php" extension) in reality, they contain as much or more XHTML as PHP, so that is why I posted here.

    Thanks,


    Amy

  2. #2
    It's all Geek to me silver trophybronze trophy
    ralph.m's Avatar
    Join Date
    Mar 2009
    Location
    Melbourne, AU
    Posts
    24,176
    Mentioned
    454 Post(s)
    Tagged
    8 Thread(s)
    Hi Amy.

    The meta tag is the way I'm familiar with.

    BTW, while technically all of my web pages are PHP files
    Browsers don't see the PHP, as it gets processed before delivery; and I assume it's the same for the bots?

  3. #3
    It's all Geek to me silver trophybronze trophy
    ralph.m's Avatar
    Join Date
    Mar 2009
    Location
    Melbourne, AU
    Posts
    24,176
    Mentioned
    454 Post(s)
    Tagged
    8 Thread(s)
    PS You could go further with the meta tag and expand it to this:

    Code:
    <meta name="robots" content="noindex,nofollow">
    That will stop the bots indexing the page and stop it following any page links.

  4. #4
    SitePoint Addict pkSML's Avatar
    Join Date
    Aug 2006
    Location
    Ohio
    Posts
    230
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    One more quick idea on this topic, as it's one I've considered before.

    There's always the robots.txt and the meta tags, but here's another way to take care of bots. In your PHP page, give a totally different response to any user agent with the word 'bot' in it. While it may not catch every bot out there (and I've seen some weird ones), it will stop the search engines people actually use (google, microsoft, yahoo, etc.). You could have just a simple text file saying to visit your site for the content. Then that's all the bots would see. You can google for user-agent names to block.

    I myself don't care to have every word written on my server available to people's searching leisure.

    Quote Originally Posted by ralph.m View Post
    Browsers don't see the PHP, as it gets processed before delivery; and I assume it's the same for the bots?
    Yes, bots get the exact same HTML content that a browser receives (with the exception above).

    BTW, Amy, I just read your whole HDD erasure topic. Wow - what a big debate. Gonna' stay out of all that racket!
    -Stephen

    Get a LitlURL to this page!

  5. #5
    It's all Geek to me silver trophybronze trophy
    ralph.m's Avatar
    Join Date
    Mar 2009
    Location
    Melbourne, AU
    Posts
    24,176
    Mentioned
    454 Post(s)
    Tagged
    8 Thread(s)
    Hmm, seems to be a lot of differing opinions on the web. Some say the meta tag's the thing, others the robots.txt. I'd guess the robots.txt method is better, as it is there solely for bots to read.

  6. #6
    SitePoint Addict amy.damnit's Avatar
    Join Date
    Sep 2009
    Posts
    336
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by ralph.m View Post
    PS You could go further with the meta tag and expand it to this:

    Code:
    <meta name="robots" content="noindex,nofollow">
    That will stop the bots indexing the page and stop it following any page links.
    Ralph,

    This is what I currently have in my header...

    Code:
    <head>
      <title>Select a Seminar</title>
      <meta http-equiv="content-type" content="text/html; charset=utf-8" />
      <!--<link rel="stylesheet" type="text/css" href="tabs.css" />-->
      <link href="includes/101_SelectSeminar.css" rel="stylesheet" type="text/css" />
      <!--  <style type="text/css"></style>  -->
    </head>

    So how would I change things??

  7. #7
    SitePoint Addict amy.damnit's Avatar
    Join Date
    Sep 2009
    Posts
    336
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by pkSML View Post
    One more quick idea

    In your PHP page, give a totally different response to any user agent with the word 'bot' in it. While it may not catch every bot out there (and I've seen some weird ones), it will stop the search engines people actually use (google, microsoft, yahoo, etc.). You could have just a simple text file saying to visit your site for the content. Then that's all the bots would see. You can google for user-agent names to block.
    Can you give me a code example?


    BTW, Amy, I just read your whole HDD erasure topic. Wow - what a big debate. Gonna' stay out of all that racket!
    Yah, I seem to be good at starting larger discussions and debates!!

    And, yah, I'm definitely letting that one play out alone!


    Amy

  8. #8
    It's all Geek to me silver trophybronze trophy
    ralph.m's Avatar
    Join Date
    Mar 2009
    Location
    Melbourne, AU
    Posts
    24,176
    Mentioned
    454 Post(s)
    Tagged
    8 Thread(s)
    Quote Originally Posted by amy.damnit View Post
    So how would I change things??
    Ah, well just add it in like this:

    Code:
    <head>
      <title>Select a Seminar</title>
      <meta http-equiv="content-type" content="text/html; charset=utf-8" />
      <meta name="robots" content="noindex,nofollow" />
      <!--<link rel="stylesheet" type="text/css" href="tabs.css" />-->
      <link href="includes/101_SelectSeminar.css" rel="stylesheet" type="text/css" />
      <!--  <style type="text/css"></style>  -->
    </head>
    Although, from what I've read from Googling today, it sounds like what you have in the robots.txt file may do everything you want anyway.

  9. #9
    SitePoint Addict amy.damnit's Avatar
    Join Date
    Sep 2009
    Posts
    336
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by ralph.m View Post
    Ah, well just add it in like this:

    Code:
    <head>
      <title>Select a Seminar</title>
      <meta http-equiv="content-type" content="text/html; charset=utf-8" />
      <meta name="robots" content="noindex,nofollow" />
      <!--<link rel="stylesheet" type="text/css" href="tabs.css" />-->
      <link href="includes/101_SelectSeminar.css" rel="stylesheet" type="text/css" />
      <!--  <style type="text/css"></style>  -->
    </head>
    Although, from what I've read from Googling today, it sounds like what you have in the robots.txt file may do everything you want anyway.
    Thanks, Ralph.

    I wasn't sure of the syntax, and was wrong!

    I see you just added an additional <meta> tag.

    Good thing I asked!

    Thanks,


    Amy

  10. #10
    Programming Since 1978 silver trophybronze trophy felgall's Avatar
    Join Date
    Sep 2005
    Location
    Sydney, NSW, Australia
    Posts
    16,810
    Mentioned
    25 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by amy.damnit View Post
    Thanks, Ralph.

    I wasn't sure of the syntax, and was wrong!

    I see you just added an additional <meta> tag.

    Good thing I asked!

    Thanks,


    Amy
    The meta robots tag is redundant if you have a robots.txt file blocking all access since the search engines then never read the web page to see the meta tag.

    Any spambots that ignore the robots.txt will also ignore the meta tag.
    Stephen J Chapman

    javascriptexample.net, Book Reviews, follow me on Twitter
    HTML Help, CSS Help, JavaScript Help, PHP/mySQL Help, blog
    <input name="html5" type="text" required pattern="^$">

  11. #11
    SitePoint Addict amy.damnit's Avatar
    Join Date
    Sep 2009
    Posts
    336
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by felgall View Post
    The meta robots tag is redundant if you have a robots.txt file blocking all access since the search engines then never read the web page to see the meta tag.

    Any spambots that ignore the robots.txt will also ignore the meta tag.
    So I should leave out this entire tag that Ralph provided?

    Code:
    <meta name="robots" content="noindex,nofollow" />

    Or can I leave in this part?

    Code:
    <meta content="noindex,nofollow" />
    Sorry, I'm getting mixed up!


    Amy

  12. #12
    It's all Geek to me silver trophybronze trophy
    ralph.m's Avatar
    Join Date
    Mar 2009
    Location
    Melbourne, AU
    Posts
    24,176
    Mentioned
    454 Post(s)
    Tagged
    8 Thread(s)
    I think felgall is just saying forget about the meta tag altogether. As I said above, the robots.txt file seems the better way to go.

  13. #13
    Programming Since 1978 silver trophybronze trophy felgall's Avatar
    Join Date
    Sep 2005
    Location
    Sydney, NSW, Australia
    Posts
    16,810
    Mentioned
    25 Post(s)
    Tagged
    1 Thread(s)
    With a robots.txt file denying the search engines access nothing is going to pay any attention to the robots meta tag and you are better off remiving it completely.
    Stephen J Chapman

    javascriptexample.net, Book Reviews, follow me on Twitter
    HTML Help, CSS Help, JavaScript Help, PHP/mySQL Help, blog
    <input name="html5" type="text" required pattern="^$">

  14. #14
    SitePoint Addict amy.damnit's Avatar
    Join Date
    Sep 2009
    Posts
    336
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks for the clarification, Stephen!!


    Amy

  15. #15
    Resident curmudgeon bronze trophy gary.turner's Avatar
    Join Date
    Jan 2009
    Location
    Dallas
    Posts
    990
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Using the meta tag would be useful should you merely want to temporarily exclude a given page, and don't want to mess with the robots.txt file.

    cheers,

    gary
    Anyone can build a usable website. It takes a graphic
    designer to make it slow, confusing, and painful to use.

    Simple minded html & css demos and tutorials

  16. #16
    Follow: @AlexDawsonUK silver trophybronze trophy AlexDawson's Avatar
    Join Date
    Feb 2009
    Location
    England, UK
    Posts
    8,111
    Mentioned
    0 Post(s)
    Tagged
    1 Thread(s)
    actually, I would say it's easier for general maintenance to use the robots file purely because you can add and remove indexing statements within the single location rather than searching for the right file you need to edit. Though as a side note if you want a page to be invisible to search engines which ignore the robots file you should probably password protect the page (or ensure it's an orphaned file with no direct links to).

  17. #17
    SitePoint Zealot
    Join Date
    Aug 2006
    Location
    Scotland, UK
    Posts
    100
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    As has been indicated in the thread, but worth emphasising, only "good" bots will obey your robots file or meta tags.

    There is NOTHING to stop anyone crawling your pages and archiving them (or doing what they please with them). If for example, such an archive was available to Google, then your content would still get on Google.

    Is there a particular reason for not wanting the major search engines/archivers to list your site and content?
    Charles Sweeney
    FormToEmail.com


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •