Saturday, April 5, 2008

Robots.txt - Stop Search Engines To Access Your Private Files

The robots.txt file is a text file containing commands to the engine crawlers research to clarify their pages who may or may not be indexed. Thus any search engine began its exploration of a website seeking robots.txt at the root of the site.

Format robots.txt

The robots.txt (written in lower case and plural) is an ASCII file that are at the root of the site and may contain the following commands:

* User-Agent: allows you to specify the robot affected by the following guidelines.
* The value means "all search engines".
* Disallow: allows you to specify the pages to exclude from indexing. Each page or path to exclude must be on a line at hand and must begin with. The value / sole means "all pages."

The robots.txt file should contain no blank line!

Examples of robots.txt:

* Exclusion of all pages:

User-Agent: *
Disallow: /

* Exclusion of any page (equivalent to the absence of robots.txt, all pages are visited):

User-Agent: *

* Authorization of a single robot:

User-Agent: nomDuRobot
User-Agent: *
Disallow: /

* Exclusion of a robot:

User-Agent: NomDuRobot
Disallow: /
User-Agent: *

* Excluding one-page:

User-Agent: *
Disallow: / directory / path / page.html

* Exclusion of several page:

User-Agent: *
Disallow: / directory / path / page.html
Disallow: / repertoire/chemin/page2.html
Disallow: / repertoire/chemin/page3.html

* Exclusion of all pages of a directory and its subfolders:

User-Agent: *
Disallow: / directory /


