ROBOTS: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 1: Line 1:
[[image:Exciting Comics 3.jpg|thumb|[Image.]&ensp; The Robots Exclusion Protocol will not prevent bad bots from accessing your website. <ref><code>[[commons:category:robots in art]]</code></ref>]]
[[image:Exciting Comics 3.jpg|thumb|[Image.]&ensp; The Robots Exclusion Protocol will not prevent bad bots from accessing your website. <ref><code>[[commons:category:robots in art]]</code></ref>]]


One of the first files you should add to your website is "<code>/robots.txt</code>". <ref><code>https://www.robotstxt.org/</code></ref>&ensp; This is a plaintext file for the Robots Exclusion Protocol (ROBOTS language, Internet Society Request for Comments [RFC] 9309). <ref><code>https://www.robotstxt.org/robotstxt.html</code></ref>&ensp; What the "<code>/robots.txt</code>" file does is instruct which webdirectories should be accessed or avoided by web bots.
One of the first files you should add to your website is "<code>/robots.txt</code>". <ref><code>https://www.rfc-editor.org/rfc/rfc9309</code></ref> <ref><code>https://www.robotstxt.org/</code></ref>&ensp; This is a plaintext file for the Robots Exclusion Protocol (ROBOTS language, Internet Society Request for Comments [RFC] 9309). <ref><code>https://www.robotstxt.org/robotstxt.html</code></ref>&ensp; What the "<code>/robots.txt</code>" file does is instruct which webdirectories should be accessed or avoided by web bots.


An important thing to remember is that no bot is <em>required</em> to follow the Robots Exclusion Protocol. <ref><code>https://www.robotstxt.org/faq/prevent.html</code></ref> <ref><code>https://www.robotstxt.org/faq/blockjustbad.html</code></ref> <ref><code>https://www.robotstxt.org/faq/legal.html</code></ref>&ensp; The protocol only affects the behavior of compliant or well-behaved bots and anyone can program a bot to ignore the Robots Exclusion Protocol.&ensp; As such, you should <em>not</em> use the Robots Exclusion Protocol to try to hide sensitive directories, especially since publicly listing the directories in "<code>/robots.txt</code>" simply gives malicious bots an easy way to find the very directories you don't want them to visit. <ref><code>https://www.robotstxt.org/faq/nosecurity.html</code></ref>&ensp; On Apache HTTP (Hypertext Transfer Protocol) Server, you should use "<code>/.htaccess</code>" (hypertext access) instead to hide directories from public access.
An important thing to remember is that no bot is <em>required</em> to follow the Robots Exclusion Protocol. <ref><code>https://www.robotstxt.org/faq/prevent.html</code></ref> <ref><code>https://www.robotstxt.org/faq/blockjustbad.html</code></ref> <ref><code>https://www.robotstxt.org/faq/legal.html</code></ref>&ensp; The protocol only affects the behavior of compliant or well-behaved bots and anyone can program a bot to ignore the Robots Exclusion Protocol.&ensp; As such, you should <em>not</em> use the Robots Exclusion Protocol to try to hide sensitive directories, especially since publicly listing the directories in "<code>/robots.txt</code>" simply gives malicious bots an easy way to find the very directories you don't want them to visit. <ref><code>https://www.robotstxt.org/faq/nosecurity.html</code></ref>&ensp; On Apache HTTP (Hypertext Transfer Protocol) Server, you should use "<code>/.htaccess</code>" (hypertext access) instead to hide directories from public access.