Robots.txt

What is robots.txt and how does it work?

i use robot.txt in order to tell the search engine crawler what pages should they crawl and what page is forbidden to crawl. but there is many more information that you can search on google.com

Note that not all search engines obey the robots.txt file(s). Don’t count on it as being a security measure, as the more questionable search engines will ignore it.

It generally takes the form:

User-agent: *
Disallow: /page-or-folder-you-dont-want-indexed.html

(The * means you’re directing the command to all search engines.)

But of course, using robots.txt to ‘hide’ files from search engines, it still means that other people can find the files your trying to hide, which obviously is not a good thing. There are a number of ways around this of course, but you should never think robots.txt can be used for anything related to security, it’s purely there to tell search engines what to crawl and what not to.

Not to sound snarky or rude here but I don’t know why people bother asking these kind of stupid questions… unless Charlie is that incapable of browsing the Internet that going to Google and typing “robots.txt” is too much? Because the first result for that is the robots specification (explaining everything he needs to know) and the second is the Wikipedia article which gives an almost complete guide in terms any dummy can understand. Did it really require a post in a forum? No. :slight_smile:

:nono: Let me Google that for you…

Yes you do. Yet despite the poster’s obvious motivation, their post remains.

lets say I have http://www.mysite.com/cute.html

User-agent: *
Allow: /cute.html

is it acceptable?

bakers, the point of robots.txt is to disallow what you don’t want indexed, everything is set to allow unless otherwise specified, so what you posted is unnecessary.

PS: hooperman, well it does get tiring watching the same “lazy” questions reappear. :slight_smile:

I agree. If only there was some way to get rid of sig link fluff…

Robots.txt file tells the search engines which pages or which folder you do not want to crawl by the search engine and also suggest which pages you want to crawl by the search engines.

There is - click the little alert icon under the post…

Regarding robots.txt… I’ve read that it is good practice to have a robots.txt file in your root folder even it it’s blank, supposedly because robots search for one and like to see it there. Does anyone have any views on that?

robot.txt is the notepad file. With the help of this is we can stop search engines crawlers by crawling in our web page or any specific area of our page.

To be honest, I can’t be bothered anymore…

Robots.txt file is also called the robots exclusion file. It is primarily used to tell spiders which pages you don’t want them to index.