Tuesday, September 20, 2005

Hide webserver contents from search engines

Source: http://www.searchtools.com/robots/robots-txt.html

My Notes: If you have a website and would like to keep some content private i.e. not to be used by search crawlers for indexing, then create a file called Robots.txt in your root directory, in which you can disallow indexing of certain files and folders.

check this example of robots.txt file

Writeup from the source:
Search engine robots will check a special file in the root of each server called robots.txt, which is, as you may guess, a plain text file (not HTML). Robots.txt implements the Robots Exclusion Protocol, which allows the web site administrator to define what parts of the site are off-limits to specific robot user agent names. Web administrators can disallow access to cgi, private and temporary directories, for example, because they do not want pages in those areas indexed.

Source: http://www.searchtools.com/robots/robots-txt.html

No comments: