Antimalware and SITEMAP: Difference between pages
Nicole Sharp (talk | contribs) (Redirected page to Security#antimalware) Tag: New redirect |
Nicole Sharp (talk | contribs) No edit summary |
||
Line 1: | Line 1: | ||
Adding a [https://www.sitemaps.org/ sitemap] to your website allows searchbots to find pages much faster and more efficiently, allowing them to be quickly indexed for search engines.  Sitemaps can be saved as either "<code>/sitemap.txt</code>" or "<code>/sitemap.xml</code>" and should be in the root webdirectory ("<code>/</code>"). <ref><code>https://www.sitemaps.org/</code></ref> <ref><code>https://www.sitemaps.org/protocol.html</code></ref>  Using plaintext (TXT) is much faster and easier than writing extensible markup language (XML).  I recommend keeping the sitemap as plaintext, allowing the SITEMAP protocol to join the ranks of the other plaintext website protocols for <u>[[ROBOTS]]</u>, [https://www.securitytxt.org/ SECURITY], and [https://humanstxt.org/ HUMANS]. | |||
[[ | As with all webtext files, you should use an advanced text editor such as [https://www.notepad-plus-plus.org/ Notepad-Plus-Plus] that supports Unix line endings.  Do not use Microsoft Windows Notepad. | ||
== canonical links == | |||
To create a sitemap, you simply make a plaintext list of each URL (uniform resource locator) for the website with one URL per line and no other content (no comments).  Only URLs for a single domain should be included — do not add URLs for subdomains or alias domains.  You should also only list canonical URLs.  This means that if a particular webpage can be accessed from multiple URLs, only one URL should be listed for that webpage in the sitemap. | |||
For example, there are many different ways to access <u>[[Nicole Sharp's Homepage]]</u>: | |||
<code><pre> | |||
https://www.nicolesharp.net/ | |||
https://www.nicolesharp.net/index.htm | |||
https://www.nicolesharp.net/index.html | |||
https://www.nicolesharp.net/w/ | |||
https://www.nicolesharp.net/w/index.php | |||
https://www.nicolesharp.net/w/index.php?title=NikkiWiki | |||
https://www.nicolesharp.net/w/index.php?title=Main_Page | |||
https://www.nicolesharp.net/w/index.php?title=NikkiWiki:Main_Page | |||
https://www.nicolesharp.net/wiki/ | |||
https://www.nicolesharp.net/wiki/NikkiWiki | |||
https://www.nicolesharp.net/wiki/Main_Page | |||
https://www.nicolesharp.net/wiki/index | |||
</pre></code> | |||
The canonical URL though is | |||
: <u><code>[[about Nicole Sharp's Homepage|https://www.nicolesharp.net/wiki/NikkiWiki]]</code></u> | |||
since all of the other URLs redirect to that URL. | |||
In [[mw:Main Page|Wikimedia MediaWiki]], canonical URLs are provided by adding | |||
: <code>[[mw:$wgEnableCanonicalServerLink|$wgEnableCanonicalServerLink]] = true;</code> | |||
to "<code>LocalSettings.php</code>". | |||
== no subdomains == | |||
Here are even more ways to access Nicole Sharp's Homepage: | |||
<code><pre> | |||
https://nicolesharp.net/ | |||
https://www.nicolesharp.net/ | |||
https://web.nicolesharp.net/ | |||
https://en.nicolesharp.net/ | |||
https://eng.nicolesharp.net/ | |||
https://us.nicolesharp.net/ | |||
https://usa.nicolesharp.net/ | |||
https://wiki.nicolesharp.net/ | |||
https://w.nicolesharp.net/ | |||
http://www.nicolesharp.net/ | |||
http://nicolesharp.net/ | |||
http://nicolesharp.altervista.org/ | |||
http://nicolesharp.dreamhosters.com/ | |||
https://nicolesharp.dreamhosters.com/ | |||
</pre></code> | |||
With the exception of "<code><nowiki>https://www.nicolesharp.net/</nowiki></code>", none of these other URLs should be included in "<u><code>https://www.nicolesharp.net/sitemap.txt</code></u>".  All of the URLs should have the same protocol (either all HTTPS [Hypertext Transfer Protocol Secure] or all HTTP [Hypertext Transfer Protocol]) and all of the URLs should be on the same subdomain (for example, either all with "<code>www</code>" or all without "<code>www</code>"). | |||
== example == | |||
The following "<code>/sitemap.txt</code>" example gives a compliant sitemap for "<u><code>[[Nicole Sharp's Website|https://www.nicolesharp.net/]]</code></u>": | |||
<code><syntaxhighlight lang="text"> | |||
https://www.nicolesharp.net/wiki/NikkiWiki | |||
https://www.nicolesharp.net/wiki/about_NikkiWiki | |||
https://www.nicolesharp.net/wiki/Nicole_Sharp | |||
https://www.nicolesharp.net/wiki/license_for_Nicole_Sharp's_Website | |||
https://www.nicolesharp.net/wiki/analytics_for_Nicole_Sharp's_Website | |||
https://www.nicolesharp.net/wiki/donations | |||
https://www.nicolesharp.net/wiki/security | |||
</syntaxhighlight></code> | |||
Only canonical URLs are included, all of the URLs have the same protocol ("<code>https://</code>"), and all of the URLs are on the same subdomain ("<code>www.nicolesharp.net</code>").  Each new subdomain will need its own sitemap. | |||
== ROBOTS == | |||
Once your sitemap is completed, you can add it to the Robots Exclusion Protocol to be indexed by searchbots.  An example "<code>/robots.txt</code>" with a sitemap is given below. | |||
<code><highlight lang="robots"> | |||
User-agent: * | |||
Disallow: | |||
Sitemap: https://www.example.net/sitemap.txt | |||
</highlight></code> | |||
== see also == | |||
* <u><code>https://www.nicolesharp.net/sitemap.txt</code></u> | |||
* <code>https://www.sitemaps.org/</code> | |||
* <u><code>[[ROBOTS]]</code></u> | |||
* <code>https://www.securitytxt.org/</code> | |||
* <code>https://humanstxt.org/</code> | |||
== references == | |||
<references /> | |||
== keywords == | |||
<code>bots, CANONICAL, development, hyperlinks, indexing, links, ROBOTS, robots.txt, searchbots, SITEMAP, sitemap.txt, TXT, URLs, web, webcrawlers, webcrawling, webdevelopment, weblinks, WWW</code> | |||
{{#seo:|keywords=bots, CANONICAL, development, hyperlinks, indexing, links, ROBOTS, robots.txt, searchbots, SITEMAP, sitemap.txt, TXT, URLs, web, webcrawlers, webcrawling, webdevelopment, weblinks, WWW}} | |||
[[category:webdevelopment]] |
Revision as of 2023-09-05T00:21:46
Adding a sitemap to your website allows searchbots to find pages much faster and more efficiently, allowing them to be quickly indexed for search engines. Sitemaps can be saved as either "/sitemap.txt
" or "/sitemap.xml
" and should be in the root webdirectory ("/
"). [1] [2] Using plaintext (TXT) is much faster and easier than writing extensible markup language (XML). I recommend keeping the sitemap as plaintext, allowing the SITEMAP protocol to join the ranks of the other plaintext website protocols for ROBOTS, SECURITY, and HUMANS.
As with all webtext files, you should use an advanced text editor such as Notepad-Plus-Plus that supports Unix line endings. Do not use Microsoft Windows Notepad.
canonical links
To create a sitemap, you simply make a plaintext list of each URL (uniform resource locator) for the website with one URL per line and no other content (no comments). Only URLs for a single domain should be included — do not add URLs for subdomains or alias domains. You should also only list canonical URLs. This means that if a particular webpage can be accessed from multiple URLs, only one URL should be listed for that webpage in the sitemap.
For example, there are many different ways to access Nicole Sharp's Homepage:
https://www.nicolesharp.net/
https://www.nicolesharp.net/index.htm
https://www.nicolesharp.net/index.html
https://www.nicolesharp.net/w/
https://www.nicolesharp.net/w/index.php
https://www.nicolesharp.net/w/index.php?title=NikkiWiki
https://www.nicolesharp.net/w/index.php?title=Main_Page
https://www.nicolesharp.net/w/index.php?title=NikkiWiki:Main_Page
https://www.nicolesharp.net/wiki/
https://www.nicolesharp.net/wiki/NikkiWiki
https://www.nicolesharp.net/wiki/Main_Page
https://www.nicolesharp.net/wiki/index
The canonical URL though is
since all of the other URLs redirect to that URL.
In Wikimedia MediaWiki, canonical URLs are provided by adding
$wgEnableCanonicalServerLink = true;
to "LocalSettings.php
".
no subdomains
Here are even more ways to access Nicole Sharp's Homepage:
https://nicolesharp.net/
https://www.nicolesharp.net/
https://web.nicolesharp.net/
https://en.nicolesharp.net/
https://eng.nicolesharp.net/
https://us.nicolesharp.net/
https://usa.nicolesharp.net/
https://wiki.nicolesharp.net/
https://w.nicolesharp.net/
http://www.nicolesharp.net/
http://nicolesharp.net/
http://nicolesharp.altervista.org/
http://nicolesharp.dreamhosters.com/
https://nicolesharp.dreamhosters.com/
With the exception of "https://www.nicolesharp.net/
", none of these other URLs should be included in "https://www.nicolesharp.net/sitemap.txt
". All of the URLs should have the same protocol (either all HTTPS [Hypertext Transfer Protocol Secure] or all HTTP [Hypertext Transfer Protocol]) and all of the URLs should be on the same subdomain (for example, either all with "www
" or all without "www
").
example
The following "/sitemap.txt
" example gives a compliant sitemap for "https://www.nicolesharp.net/
":
https://www.nicolesharp.net/wiki/NikkiWiki
https://www.nicolesharp.net/wiki/about_NikkiWiki
https://www.nicolesharp.net/wiki/Nicole_Sharp
https://www.nicolesharp.net/wiki/license_for_Nicole_Sharp's_Website
https://www.nicolesharp.net/wiki/analytics_for_Nicole_Sharp's_Website
https://www.nicolesharp.net/wiki/donations
https://www.nicolesharp.net/wiki/security
Only canonical URLs are included, all of the URLs have the same protocol ("https://
"), and all of the URLs are on the same subdomain ("www.nicolesharp.net
"). Each new subdomain will need its own sitemap.
ROBOTS
Once your sitemap is completed, you can add it to the Robots Exclusion Protocol to be indexed by searchbots. An example "/robots.txt
" with a sitemap is given below.
User-agent: *
Disallow:
Sitemap: https://www.example.net/sitemap.txt
see also
https://www.nicolesharp.net/sitemap.txt
https://www.sitemaps.org/
ROBOTS
https://www.securitytxt.org/
https://humanstxt.org/
references
keywords
bots, CANONICAL, development, hyperlinks, indexing, links, ROBOTS, robots.txt, searchbots, SITEMAP, sitemap.txt, TXT, URLs, web, webcrawlers, webcrawling, webdevelopment, weblinks, WWW