What is robots.txt and how does it work?

i use robot.txt in order to tell the search engine crawler what pages should they crawl and what page is forbidden to crawl.

Note that not all search engines obey the robots.txt file(s). Don’t count on it as being a security measure, as the more questionable search engines will ignore it.

It generally takes the form:

User-agent: *
Disallow: /page-or-folder-you-dont-want-indexed.html

(The * means you’re directing the command to all search engines.)

But of course, using robots.txt to ‘hide’ files from search engines, it still means that other people can find the files your trying to hide, which obviously is not a good thing. There are a number of ways around this of course, but you should never think robots.txt can be used for anything related to security, it’s purely there to tell search engines what to crawl and what not to.

lets say I have

User-agent: *
Allow: /cute.html

is it acceptable?

bakers, the point of robots.txt is to disallow what you don’t want indexed, everything is set to allow unless otherwise specified, so what you posted is unnecessary.

Robots.txt file tells the search engines which pages or which folder you do not want to crawl by the search engine and also suggest which pages you want to crawl by the search engines.

Regarding robots.txt… I’ve read that it is good practice to have a robots.txt file in your root folder even it it’s blank, supposedly because robots search for one and like to see it there. Does anyone have any views on that?

