Help with robot.txt

leelong · September 18, 2010, 10:20pm

i have only one PHP file to generate all contents,

my_domain.com/index.php?menu=chapter_1
my_domain.com/index.php?menu=chapter_2
.
.
.
my_domain.com/index.php?menu=contact_us

there is only an ‘index.php’ to create all contents.

if i want SE spiders not to crawl ‘Contact Us’ page, i will write that into robot.txt


User-Agent: *
Disallow: /[B]index.php[/B]?menu=contact_us

will that cause any problem with my index.php? i mean will those spiders really be ‘disallowed’ and not index my index.php file?? and hence my entire site will not be indexed?

ideamonk · September 19, 2010, 6:25am

Hi leelong,

That should work. The specs say -
“this field specifies a partial URL that is not to be visited. This can be a full path, or a partial path; any URL that starts with this value will not be retrieved.”

Since you just wish to disallow just one page, add this meta tag to it -

Different people have had confusions in the past as different spiders understand robots.txt differently, while some allow regexp, some might not. So in that case the ? symbol might be an issue.

leelong · September 19, 2010, 9:20pm

thank you

i think adding <META NAME=“ROBOTS” CONTENT=“NOINDEX, NOFOLLOW”> is better solution for me

Topic		Replies	Views
Problem with robots.txt file coding? Marketing	2	410	October 8, 2014
Crawler not show content? Marketing	7	873	December 29, 2016
Robots.txt Help Marketing	8	517	February 24, 2010
Noindex on dynamically generated pages? PHP	10	9108	October 8, 2014
Limit robots to one page only Marketing	1	339	May 18, 2010

Help with robot.txt

Related topics