Tuesday, September 20, 2005

Hide webserver contents from search engines

Source: http://www.searchtools.com/robots/robots-txt.html

My Notes: If you have a website and would like to keep some content private i.e. not to be used by search crawlers for indexing, then create a file called Robots.txt in your root directory, in which you can disallow indexing of certain files and folders.

check this example of robots.txt file

Writeup from the source:
Search engine robots will check a special file in the root of each server called robots.txt, which is, as you may guess, a plain text file (not HTML). Robots.txt implements the Robots Exclusion Protocol, which allows the web site administrator to define what parts of the site are off-limits to specific robot user agent names. Web administrators can disallow access to cgi, private and temporary directories, for example, because they do not want pages in those areas indexed.

