ROBOTS: Difference between revisions
Jump to navigation
Jump to search
Nicole Sharp (talk | contribs) No edit summary |
Nicole Sharp (talk | contribs) No edit summary |
||
Line 1: | Line 1: | ||
[[image:Exciting Comics 3.jpg|thumb|[Image.]  The Robots Exclusion Protocol will not prevent bad bots from accessing your website.]] | [[image:Exciting Comics 3.jpg|thumb|[Image.]  The Robots Exclusion Protocol will not prevent bad bots from accessing your website. <ref><code>[[commons:category:robots in art]]</code></ref>]] | ||
One of the first files you should add to your website is "<code>/robots.txt</code>".  This is a plaintext file for the [https://www.robotstxt.org/ Robots Exclusion Protocol] (ROBOTS language).  What the <code>robots.txt</code> file does is instruct which webdirectories should be accessed or avoided by web bots. | One of the first files you should add to your website is "<code>/robots.txt</code>". <ref><code>https://www.robotstxt.org/</code></ref>  This is a plaintext file for the [https://www.robotstxt.org/ Robots Exclusion Protocol] (ROBOTS language).  What the <code>robots.txt</code> file does is instruct which webdirectories should be accessed or avoided by web bots. | ||
An important thing to remember is that no bot is <em>required</em> to follow the Robots Exclusion Protocol.  The protocol only affects the behavior of compliant or well-behaved bots and anyone can program a bot to ignore "<code>robots.txt</code>".  As such, you should <em>not</em> use the Robots Exclusion Protocol to try to hide sensitive directories, especially since publicly listing the directories in "<code>robots.txt</code>" simply gives malicious bots an easy way to find the very directories you don't want them to visit.  To hide directories from public access (on Apache <abbr title="Hypertext Transfer Protocol">HTTP</abbr> Server) you should use "<code>/.htaccess</code>" (hypertext access) instead. | An important thing to remember is that no bot is <em>required</em> to follow the Robots Exclusion Protocol.  The protocol only affects the behavior of compliant or well-behaved bots and anyone can program a bot to ignore "<code>robots.txt</code>".  As such, you should <em>not</em> use the Robots Exclusion Protocol to try to hide sensitive directories, especially since publicly listing the directories in "<code>robots.txt</code>" simply gives malicious bots an easy way to find the very directories you don't want them to visit.  To hide directories from public access (on Apache <abbr title="Hypertext Transfer Protocol">HTTP</abbr> Server) you should use "<code>/.htaccess</code>" (hypertext access) instead. | ||
Line 72: | Line 72: | ||
* <code>https://www.securitytxt.org/</code> | * <code>https://www.securitytxt.org/</code> | ||
* <code>https://humanstxt.org/</code> | * <code>https://humanstxt.org/</code> | ||
== references == | |||
<references /> | |||
== keywords == | == keywords == |