SitePoint Sponsor

User Tag List

Results 1 to 10 of 10

Hybrid View

  1. #1
    SitePoint Zealot tim@getdim's Avatar
    Join Date
    Jul 2012
    Location
    Detroit
    Posts
    165
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Question noindex on dynamically generated pages?

    I have spent several hours looking for a solution but so far have not come up with one.

    I know there is a way to generate meta tags dynamically with php (most often title, description, keywords, etc)
    but is there a way to generate a robots noindex tag?

    I have a php page that dynamically generates other pages and i want it to pass the noindex tag to all the pages it generates.
    I still want the links to be followed, but i dont want the pages themselves indexed.

    is this possible?

  2. #2
    SitePoint Addict Banana Man's Avatar
    Join Date
    Dec 2005
    Posts
    391
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    I've never done it myself but off the top of my head maybe you could create a robots.php file and dynamically create the no index rules in that and then use mod_rewrite to change the robots.php file name to robots.txt.

    You could also add the HTML tag to each page:

    <meta name="robots" content="noindex">

    Why do you want the links to be followed if you don't want the pages indexed though?

  3. #3
    SitePoint Zealot tim@getdim's Avatar
    Join Date
    Jul 2012
    Location
    Detroit
    Posts
    165
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    If it were as simple as putting the html tag in each page, i wouldnt be asking.
    Only one page actually exists as a php file. This page dynamically generates subsequent pages which do not actually exist as individual files.

    as far as the why - they are pages of links from LinkMarket - please dont tell me the pro's and con's of pages that only contain links, its not my call, i'm just the code monkey and my opinions and suggestions are frequently ignored by the boss...

    If your not familiar with it, Link Market is a link exchange directory site. Each site linked to from these pages are linking back to the homepage of this site im working on.

    Forgive me if i sound rude at all. It was not intentional.

  4. #4

  5. #5
    SitePoint Zealot tim@getdim's Avatar
    Join Date
    Jul 2012
    Location
    Detroit
    Posts
    165
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I'm still pretty new with php and i want to make sure i'm implementing this correctly.
    here is the full source code of my-link-page.php which generates said subpages:

    (NOTE: the entirety of the page's code was generated for me by LinkMarket - i simply copied and pasted, The header() function is the only thing i added myself)

    Code:
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml" >
    <head>
        <title>My Link Page</title>
    	<style type="text/css">
    		body, table, td, tr, a {color: #333333; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 11px;}
    		a{text-decoration: underline;}
    		a:hover {color: #999999;}
    		a:active {color: #999999;}
    		a:visited{color: #666666;}
    		.tblcel_results, .tblcel_sl_header {border-bottom-style: solid; border-bottom-width: 1; }
    		.url_and_date, .sl_url_and_date{font-style:italic;filter: alpha(opacity=50);}
    		.description{filter: alpha(opacity=50);}
    		.nav_numbers{padding: 4px;}
    	</style>
    </head>
    <body>
    	<div>
    		<?php
    header("X-Robots-Tag: noindex", true);
    
    		 /*
    		   Link Market Link Page Module
    		   Copyright 2003 Link Market, All Rights Reserved.
    		   LDMS CODE for: http://www.liveoutloudproductions.com/ 
    		   WARNING: Do not change code below or your link page will not work!
    		 */
    
    		 $user_id = "dh57bhX8M9Y=";
    
    		 $url = "http://api.linkmarket.com/mng_dir/get_links.php?user_id="
    				.$user_id."&cid=".$_GET['cid']."&start=".$_GET['start']."";
    
    		echo GetLMDSContent($url);
    
    		function GetLMDSContent($url)
    		{
    		$buffer = ""; 
    		$urlArr = parse_url($url);
    		if($urlArr[query])
    		{
    		$urlArr[query] = "?".$urlArr[query];
    		}
    
    		$fp = fsockopen($urlArr[host], 80, $errno, $errstr, 30);
    		if (!$fp){echo "$errstr ($errno)<br />\n";}
    		else
    		{
    		$out = "GET /".substr($urlArr[path], 1).$urlArr[query]." HTTP/1.0\r\n";
    		$out .= "Host: ".$urlArr[host]."\r\n";
    		$out .= "Connection: Close\r\n\r\n";
    		fwrite($fp, $out);
    		while (!feof($fp))
    		{
    		$buffer .= fgets($fp, 128);
    		} 
    		fclose($fp);
    		}
    
    		$buffer = strstr($buffer,"\r\n\r\n");
    
    		return $buffer;
    		}
    
    		?>
    	</div>
    </body>
    </html>

  6. #6
    SitePoint Zealot tim@getdim's Avatar
    Join Date
    Jul 2012
    Location
    Detroit
    Posts
    165
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    come to think of it,

    could i use this x-robots tag (or standard robots meta tag) in my robots.txt file regarding a specific subfolder on the server?

    say this links page was in its own folder on the server and i wanted to apply noindex to everything in that folder. is this easier? is this even possible?

  7. #7
    Programming Since 1978 silver trophybronze trophy felgall's Avatar
    Join Date
    Sep 2005
    Location
    Sydney, NSW, Australia
    Posts
    16,869
    Mentioned
    25 Post(s)
    Tagged
    1 Thread(s)
    PHP headers must come before any of the HTML and even before the doctype or you will get an error. If you want to apply it after you start writing the HTML then you need to do it as the HTML meta tag which will attempt to apply the nodindex that the header would have applied after the page has already started to be created.

    Code:
    <?php
    header("X-Robots-Tag: noindex", true);
     ?>
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml" >
    <head>
        <title>My Link Page</title>
    	<style type="text/css">
    		body, table, td, tr, a {color: #333333; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 11px;}
    		a{text-decoration: underline;}
    		a:hover {color: #999999;}
    		a:active {color: #999999;}
    		a:visited{color: #666666;}
    		.tblcel_results, .tblcel_sl_header {border-bottom-style: solid; border-bottom-width: 1; }
    		.url_and_date, .sl_url_and_date{font-style:italic;filter: alpha(opacity=50);}
    		.description{filter: alpha(opacity=50);}
    		.nav_numbers{padding: 4px;}
    	</style>
    </head>
    <body>
    	<div>
    
    		 /*
    		   Link Market Link Page Module
    		   Copyright 2003 Link Market, All Rights Reserved.
    		   LDMS CODE for: http://www.liveoutloudproductions.com/ 
    		   WARNING: Do not change code below or your link page will not work!
    		 */
    
    		 $user_id = "dh57bhX8M9Y=";
    
    		 $url = "http://api.linkmarket.com/mng_dir/get_links.php?user_id="
    				.$user_id."&cid=".$_GET['cid']."&start=".$_GET['start']."";
    
    		echo GetLMDSContent($url);
    
    		function GetLMDSContent($url)
    		{
    		$buffer = ""; 
    		$urlArr = parse_url($url);
    		if($urlArr[query])
    		{
    		$urlArr[query] = "?".$urlArr[query];
    		}
    
    		$fp = fsockopen($urlArr[host], 80, $errno, $errstr, 30);
    		if (!$fp){echo "$errstr ($errno)<br />\n";}
    		else
    		{
    		$out = "GET /".substr($urlArr[path], 1).$urlArr[query]." HTTP/1.0\r\n";
    		$out .= "Host: ".$urlArr[host]."\r\n";
    		$out .= "Connection: Close\r\n\r\n";
    		fwrite($fp, $out);
    		while (!feof($fp))
    		{
    		$buffer .= fgets($fp, 128);
    		} 
    		fclose($fp);
    		}
    
    		$buffer = strstr($buffer,"\r\n\r\n");
    
    		return $buffer;
    		}
    
    		?>
    	</div>
    </body>
    </html>
    If the pages you don't want indexed are all in a folder then why not just put an appropriate robots.txt file there denying the search engines access to that entire folder.

    Code:
    User-agent: *
    Disallow: /
    Stephen J Chapman

    javascriptexample.net, Book Reviews, follow me on Twitter
    HTML Help, CSS Help, JavaScript Help, PHP/mySQL Help, blog
    <input name="html5" type="text" required pattern="^$">

  8. #8
    SitePoint Zealot tim@getdim's Avatar
    Join Date
    Jul 2012
    Location
    Detroit
    Posts
    165
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I thought about that, but will that still allow the links to be crawled? if not it kindof defeats the purpose...

    also, if i use the x-robots tag with a php header, how do i ensure that the generated pages will pull the header?
    It seems like just putting the header in, above everything else, will only apply it to the page it is on...

    Sorry if im asking stupid questions.

  9. #9
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,235
    Mentioned
    154 Post(s)
    Tagged
    0 Thread(s)
    From what I've read, noindex simply tells the bot to not index your page. It wills till crawl it and follow any links on it unless you use noindex,nofollow

  10. #10
    SitePoint Mentor bronze trophy
    John_Betong's Avatar
    Join Date
    Aug 2005
    Location
    City of Angels
    Posts
    1,889
    Mentioned
    74 Post(s)
    Tagged
    6 Thread(s)
    Quote Originally Posted by tim@getdim View Post
    I have spent several hours looking for a solution but so far have not come up with one.

    I know there is a way to generate meta tags dynamically with php (most often title, description, keywords, etc)
    but is there a way to generate a robots noindex tag?

    I have a php page that dynamically generates other pages and i want it to pass the noindex tag to all the pages it generates.
    I still want the links to be followed, but i dont want the pages themselves indexed.

    is this possible?
    This is the way I dynamically set the Meta Robots Tag: (single included header used on over 3,000 pages)

    PHP Code:

    <!DOCTYPE html>
    <html xmlns="http://www.w3.org/1999/xhtml" >
    <head>
      <title>My Link Page</title>
      <link type="text/css"  rel="stylesheet" href="http://localhost/assets/css/vo13-scrn-nor.css" />
      <?php
        
    # DEBUG to display a list of $_SERVER Parameters
        #echo '<pre>'; print_r($_SERVER); echo '</pre>';

        # Set default robotsContent and test for particulr URIs
           
    $robotsContent 'index, follow';
           if( 
    '/E-bay-Help.html' == $_SERVER['REQUEST_URI'] ):
             
    $robotsContent 'noindex, follow';
           endif;  
           echo 
    '<meta name="robots" content="' .$robotsContent .'" />';
      
    ?>
    </head>
    <body>
       <!-- Blurb goes here -->
    </body>
    </html>
    The if() statement should be tailored to suit your requirements, using an array() and in_array() caters for multiple URIs
    Learn how to be ready for The New Move to Discourse

    How to make Make Money Now with a *NEW* look

    Be sure to congratulate Wolfshade on earning Member of the Month for August 2014


Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •