Thursday, June 23, 2005

Taken from Document on "Search engines as penetration testing tools

In addition to server-wide robot control using robots.txt, administrators can also
specify that certain pages should not be indexed by search engine robots, or that
the links on the page should not be followed by robots. The Robots META tag,
placed in the HTML < HEAD > section of a page, can specify either or both of
these actions. Many, but not all, search engine robots will recognize this tag and
follow the rules for each page. If you want to prevent all robots from archiving
content on your site, use the NOARCHIVE meta tag.
For more information’s on the use of Metatag to exclude robots, visit the HTML
Author's Guide to the Robots META tag [8].
