Notes for Nicole Sharp's Website and SITEMAP: Difference between pages

From NikkiWiki
(Difference between pages)
Jump to navigation Jump to search
 
No edit summary
 
Line 1: Line 1:
Notes on the development of <u><cite class="u">[[Nicole Sharp's Website]]</u></u>.
Adding a sitemap to your website allows searchbots to find pages much faster and more efficiently, allowing them to be quickly indexed for search engines.&ensp; Sitemaps can be saved as either "<code>/sitemap.txt</code>" or "<code>/sitemap.xml</code>".&ensp; Using plaintext ("<code>/sitemap.txt</code>") is much faster and easier than writing extensible markup language (XML).&ensp; I recommend keeping the sitemap as plaintext, allowing the [https://www.sitemaps.org/ SITEMAP] protocol to join the ranks of the other plaintext website protocols for <u>[[ROBOTS]]</u>, [https://www.securitytxt.org/ SECURITY], and [https://humanstxt.org/ HUMANS].


== Web Platform ==
To create a sitemap, you simply make a plaintext list of each URL (uniform resource locator) for the website.&ensp; Only URLs for a single domain should be included — do not add URLs for subdomains or alias domains.&ensp; You should also only list canonical URLs.&ensp; This means that if a particular webpage can be accessed from multiple URLs, only one URL should be listed for that webpage in the sitemap.


=== <abbr title="hypertext markup language">HTML</abbr> ===
For example, there are many different ways to access <u>[[Nicole Sharp's Homepage]]</u>:


* <code>[[wikipedia:HTML]]</code>
<code><pre>
** <code>[[wikipedia:DOCTYPE]]</code>
https://www.nicolesharp.net/
** <code>[[wikipedia:HTML elements]]</code>
https://www.nicolesharp.net/index.htm
** <code>[[wikipedia:HTML attributes]]</code>
https://www.nicolesharp.net/index.html
** <code>[[wikipedia:APPCACHE]]</code>
https://www.nicolesharp.net/w/
** <code>[[wikipedia:META element]]</code>
https://www.nicolesharp.net/w/index.php
*** <code>[[wikipedia:META CHARSET]]</code>
https://www.nicolesharp.net/w/index.php?title=NikkiWiki
*** <code>[[wikipedia:META REFRESH]]</code>
https://www.nicolesharp.net/w/index.php?title=Main_Page
*** <code>[[wikipedia:favicon]]</code>
https://www.nicolesharp.net/w/index.php?title=NikkiWiki:Main_Page
**** <code>[[wikipedia:X-ICON]]</code>
https://www.nicolesharp.net/wiki/
**** <code>[[wikipedia:X-BMP]]</code>
https://www.nicolesharp.net/wiki/NikkiWiki
*** <code>[[wikipedia:media types]]</code>
https://www.nicolesharp.net/wiki/Main_Page
** <code>[[wikipedia:FRAMESET element]]</code>
https://www.nicolesharp.net/wiki/index
** <code>[[wikipedia:DIV element]]</code>
</pre></code>
** <code>[[wikipedia:BLOCKQUOTE element]]</code>
** <code>[[wikipedia:RUBY element]]</code>
** <code>[[wikipedia:CANVAS element]]</code>
** <code>[[wikipedia:FORM element]]</code>
* <code>[[wikibooks:HTML]]</code>
* <code>https://www.w3.org/MarkUp/Guide/</code>
* <code>https://www.w3.org/community/webed/wiki/HTML/Training</code>
* <code>https://developer.mozilla.org/docs/learn/</code>
* <code>https://html.spec.whatwg.org/</code>


==== <abbr title="language">LANG</abbr> ====
The canonical URL though is
: <code>[[about Nicole Sharp's Homepage|https://www.nicolesharp.net/wiki/NikkiWiki]]</code>
since all of the other URLs redirect to that URL.


* <code>[[wikipedia:LANG]]</code>
Here are even more ways to access Nicole Sharp's Homepage:
* <code>[[wikipedia:ISO 639-1 codes]]</code>
* <code>[[wikipedia:ISO 639-2 codes]]</code>
* <code>[[wikipedia:ISO 15924]]</code>
* <code>[[wikipedia:ISO 3166-1 alpha-2]]</code>


==== Unicode ====
<code><pre>
https://nicolesharp.net/
https://www.nicolesharp.net/
https://web.nicolesharp.net/
https://en.nicolesharp.net/
https://eng.nicolesharp.net/
https://us.nicolesharp.net/
https://usa.nicolesharp.net/
https://wiki.nicolesharp.net/
https://w.nicolesharp.net/
http://www.nicolesharp.net/
http://nicolesharp.net/
http://nicolesharp.altervista.org/
http://nicolesharp.dreamhosters.com/
https://nicolesharp.dreamhosters.com/
</pre></code>


* <code>[[wikipedia:character entities]]</code>
With the exception of "<code><nowiki>https://www.nicolesharp.net/</nowiki></code>", none of these other URLs should be included in "<u><code>https://www.nicolesharp.net/sitemap.txt</code></u>".&ensp; All of the URLs should have the same protocol (either all HTTPS [Hypertext Transfer Protocol Secure] or all HTTP [Hypertext Transfer Protocol]) and all of the URLs should be on the same subdomain (for example, either all with "<code>www</code>" or all without "<code>www</code>").
* <code>[[wikipedia:numeric character references]]</code>
* <code>[[wikibooks:Unicode]]</code>


==== HTML validation ====
The following "<code>/sitemap.txt</code>" example gives a compliant sitemap for "<u><code>[[Nicole Sharp's Website|https://www.nicolesharp.net/]]</code></u>":


* <code>https://validator.w3.org/</code>
<code><syntaxhighlight lang="text">
https://www.nicolesharp.net/wiki/NikkiWiki
https://www.nicolesharp.net/wiki/about_NikkiWiki
https://www.nicolesharp.net/wiki/Nicole_Sharp
https://www.nicolesharp.net/wiki/license_for_Nicole_Sharp's_Website
https://www.nicolesharp.net/wiki/analytics_for_Nicole_Sharp's_Website
https://www.nicolesharp.net/wiki/donations
https://www.nicolesharp.net/wiki/security
</syntaxhighlight></code>


=== <abbr title="cascading stylesheets">CSS</abbr> ===
Only canonical URLs are included, all of the URLs have the same protocol ("<code>https://</code>"), and all of the URLs are on the same subdomain ("<code>www.nicolesharp.net</code>").&ensp; Each new subdomain will need its own sitemap.


* <code>[[wikipedia:CSS]]</code>
Once your sitemap is completed, you can add it to the Robots Exclusion Protocol to be indexed by searchbots.&ensp; An example "<code>/robots.txt</code>" with a sitemap is given below.
* <code>[[wikibooks:CSS]]</code>
* <code>https://www.w3.org/community/webed/wiki/CSS/Training</code>


=== <abbr title="extensible hypertext markup language">XHTML</abbr> ===
<code><highlight lang="robots">
User-agent: *
Disallow:
Sitemap: https://www.example.net/sitemap.txt
</highlight></code>


* <code>[[wikipedia:XHTML]]</code>
== see also ==
* <code>[[wikibooks:XHTML]]</code>
* <code>https://www.w3.org/tr/xhtml1/</code>
 
XHTML 1.0 has better intercompatibility with HTML 5 than [https://www.w3.org/tr/xhtml11/ XHTML 1.1]. [https://en.wikibooks.org/wiki/XHTML/XHTML_Grammar#Deprecated_Tags]&ensp; [https://www.w3.org/tr/xhtml2/ XHTML 2.0] is an incomplete standard.
 
=== <abbr title="scalable vector graphics">SVG</abbr> ===
 
* <code>[[wikipedia:SVG]]</code>
* <code>https://www.w3.org/graphics/svg/ig/resources/svgprimer.html</code>
* <code>https://www.w3.org/tr/svg11/</code>
 
[https://www.w3.org/tr/svg12/ SVG 1.2] and [https://www.w3.org/tr/svg2/ SVG 2.0] are incomplete standards.
 
=== <abbr title="mathematical markup language">MathML</abbr> ===
 
* <code>[[wikipedia:MathML]]</code>
* <code>https://www.w3.org/math/whatismathml.html</code>
* <code>https://www.w3.org/tr/mathml3/</code>
 
[https://www.w3.org/tr/mathml4/ MathML 4.0] is an incomplete standard.
 
=== <abbr title="extensible markup language">XML</abbr> ===
 
* <code>[[wikipedia:XML]]</code>
* <code>[[wikibooks:XML: Managing Data Exchange]]</code>
* <code>https://www.w3.org/tr/xml11/</code>
 
==== <abbr title="document type definition">DTD</abbr> ====
 
* <code>[[wikipedia:XML DTD]]</code>
 
=== <abbr title="JavaScript">JS</abbr> ===
 
* <code>[[wikipedia:JavaScript]]</code>
* <code>[[wikibooks:JavaScript]]</code>
* <code>https://www.ecma-international.org/publications-and-standards/standards/ecma-262/</code>
 
==== <abbr title="JavaScript Object Notation">JSON</abbr> ====
 
* <code>[[wikipedia:JSON]]</code>
* <code>https://www.json.org/</code>
* <code>https://www.ecma-international.org/publications-and-standards/standards/ecma-404/</code>
 
== webmail ==
 
=== Outlook ===
 
* <code>https://www.outlook.com/</code>
 
15 <abbr title="gigabytes">GB</abbr> of free mailbox storage.
 
=== <abbr title="Google Mail">GMail</abbr> ===
 
* <code>https://www.gmail.com/</code>
 
15 GB of free mailbox storage.
 
=== Yahoo ===
 
* <code>https://mail.yahoo.com/</code>
 
1000 GB of free mailbox storage.
 
=== <abbr title="America Online">AOL</abbr> ===
 
* <code>https://mail.aol.com/</code>
 
1000 GB of free mailbox storage.
 
== webdomain ==
 
* <code>https://www.dreamhost.com/domains/</code>
 
== webhosting ==
 
* <code><nowiki>http://personal.frostburg.edu/nlsharp0/</nowiki></code>
 
=== Altervista ===
 
* <code>https://www.altervista.org/</code>
* <code>http://nicolesharp.altervista.org/</code>
 
=== DreamHost ===
 
* <code>https://www.dreamhost.com/hosting/shared/</code>
* <u><code>https://www.nicolesharp.net/</code></u>
 
== Cloudflare ==
 
* <code>https://www.cloudflare.com/</code>
 
== <abbr title="text">TXT</abbr> ==
 
=== ROBOTS ===


* <u><code>https://www.nicolesharp.net/sitemap.txt</code></u>
* <code>https://www.sitemaps.org/</code>
* <u><code>[[ROBOTS]]</code></u>
* <u><code>[[ROBOTS]]</code></u>
* <code>[[wikipedia:robots.txt]]</code>
* <code>https://www.robotstxt.org/</code>
* [https://help.dreamhost.com/hc/articles/216105077/ "Control Bots, Spiders, and Crawlers" (DreamHost)]
* <u><code>https://www.nicolesharp.net/robots.txt</code></u>
=== SITEMAP ===
* <code>[[wikipedia:sitemap.txt]]</code>
* <code>https://www.sitemaps.org/</code>
* <u><code>https://www.nicolesharp.net/sitemap.txt</code></u>
=== SECURITY ===
* <code>[[wikipedia:security.txt]]</code>
* <code>https://www.securitytxt.org/</code>
* <code>https://www.securitytxt.org/</code>
* <u><code>https://www.nicolesharp.net/security.txt</code></u>
=== HUMANS ===
* <code>https://humanstxt.org/</code>
* <code>https://humanstxt.org/</code>
* <u><code>https://www.nicolesharp.net/humans.txt</code></u>
== <abbr title="hypertext access">HTACCESS</abbr> ==
* <code>https://httpd.apache.org/docs/howto/htaccess.html</code>
* [https://help.dreamhost.com/hc/articles/216456227/ "HTACCESS Overview" (DreamHost)]
* [https://help.dreamhost.com/hc/articles/217738987/ "What Can I Do With an HTACCESS File?" (DreamHost)]
* [https://help.dreamhost.com/hc/articles/215747758/ "Force Your Site to Load Securely with an HTACCESS File" (DreamHost)]
* [https://help.dreamhost.com/hc/articles/215747748/ "How Can I Redirect and Rewrite My URLs With an HTACCESS File?" (DreamHost)]
* [https://help.dreamhost.com/hc/articles/216109967/ "Redirect Your Root Directory to a Subdirectory" (DreamHost)]
<code><syntaxhighlight lang="apache">
RewriteEngine on
RewriteRule ^/?wiki(/.*)?$ %{DOCUMENT_ROOT}/w/index.php [L]
# https://www.mediawiki.org/wiki/manual:short_URL/Apache
Redirect /index.html /wiki/NikkiWiki
Redirect /index.htm /wiki/NikkiWiki
Redirect /sandbox/index.html /wiki/NikkiWiki
Redirect /testbox/index.html /wiki/NikkiWiki
# https://help.dreamhost.com/hc/articles/215747718/
# https://help.dreamhost.com/hc/articles/215747748/
# Apache HTACCESS (Hypertext Access) for Nicole Sharp's Website.
# 2023-09-04 Nicole Sharp
# https://www.nicolesharp.net/
</syntaxhighlight></code>
== <abbr title="Personal Homepage Hypertext Preprocessor">PHP</abbr> ==
* <code>[[wikipedia:PHP]]</code>
* <code>[[wikibooks:PHP]]</code>
* <code>https://www.php.net/</code>
* <code>https://www.php.net/docs.php</code>
* [https://help.dreamhost.com/hc/articles/214202188/ "PHP Overview" (DreamHost)]
=== PHPINFO ===
* [https://help.dreamhost.com/hc/articles/214895287/ "Viewing Your Site's PHP Version and Settings" (DreamHost)]
* <u><code>https://www.nicolesharp.net/info.php</code></u>
<code><syntaxhighlight lang="php">
<?php
phpinfo();
/*
PHPINFO (Personal Homepage Hypertext Preprocessor Info) for Nicole Sharp's Website.
2023-09-03 Nicole Sharp
https://www.nicolesharp.net/
*/
?>
</syntaxhighlight></code>
=== <abbr title="initialization">INI</abbr> ===
* <code>[[wikipedia:php.ini]]</code>
* [https://help.dreamhost.com/hc/articles/214200688/ "PHP.INI Overview" (DreamHost)]
* [https://help.dreamhost.com/hc/articles/214894037/ "Create a PHPRC File Via <abbr title="File Transfer Protocol">FTP</abbr>" (DreamHost)]
== webanalytics ==
=== Matomo ===
* <code>https://www.matomo.org/</code>
* <code>https://www.matomo.org/download/</code>
* <code>https://www.matomo.org/installing-matomo/</code>
* <code>https://www.matomo.org/how-to-configure-matomo-for-security/</code>
==== Matomo plugins ====
* <code>https://plugins.matomo.org/forcessl/</code>
* <code>https://plugins.matomo.org/provider/</code>
* <code>https://plugins.matomo.org/securityinfo/</code>
* <code>https://plugins.matomo.org/ip2location/</code>
* <code>https://plugins.matomo.org/bandwidth/</code>
* <code>https://plugins.matomo.org/googleanalyticsimporter/</code>
* <code>https://plugins.matomo.org/taskstimetable/</code>
* <code>https://plugins.matomo.org/jstrackercustom/</code>
==== Matomo Cloudflare App ====
* <code>https://www.cloudflareapps.com/apps/piwik/</code>
* <code>https://www.matomo.org/how-do-i-install-the-matomo-tracking-code-on-my-cloudflare-setup/</code>
==== Matomo reports ====
* [https://www.matomo.org/faq_34856/ "Emails From Matomo Are Not Being Sent, How Do I Troubleshoot and Solve the issue?" (Matomo)]
* <code>https://www.matomo.org/create-and-schedule-a-report/</code>
* <code>https://www.matomo.org/downloading-and-sending-your-custom-reports-by-email/</code>
=== <abbr title="Google Analytics">GA</abbr> ===
* <code>https://analytics.google.com/</code>
==== GA Cloudflare App ====
* <code>https://www.cloudflareapps.com/apps/googleanalytics/</code>
=== Clarity ===
* <code>https://clarity.microsoft.com/</code>
=== Metrica ===
* <code>https://metrica.yandex.com/</code>
== websearch ==
=== Google ===
* <code>https://search.google.com/search-console/</code>
* [https://help.dreamhost.com/hc/articles/216375348/ "Google Site Verification" (DreamHost)]
=== Bing ===
* <code>https://webmaster.bing.com/</code>
=== Yandex ===
* <code>https://webmaster.yandex.com/</code>
== MediaWiki ==


* <code>[[wikipedia:MediaWiki]]</code>
== keywords ==
* <code>https://www.mediawiki.org/</code>
* [https://help.dreamhost.com/hc/articles/217292577/ "MediaWiki: Installing and More" (DreamHost)]


=== wikimarkup ===
<code>bots, development, indexing, ROBOTS, robots.txt, searchbots, SITEMAP, sitemap.txt, TXT, web, webcrawlers, webcrawling, webdevelopment, WWW</code>


* <code>[[wikibooks:MediaWiki]]</code>
{{#seo:|keywords=bots, development, indexing, ROBOTS, robots.txt, searchbots, SITEMAP, sitemap.txt, TXT, web, webcrawlers, webcrawling, webdevelopment, WWW}}


[[category:Nicole Sharp's Website]]
[[category:webdevelopment]]
[[category:notes]]

Revision as of 2023-09-04T22:30:00

Adding a sitemap to your website allows searchbots to find pages much faster and more efficiently, allowing them to be quickly indexed for search engines.  Sitemaps can be saved as either "/sitemap.txt" or "/sitemap.xml".  Using plaintext ("/sitemap.txt") is much faster and easier than writing extensible markup language (XML).  I recommend keeping the sitemap as plaintext, allowing the SITEMAP protocol to join the ranks of the other plaintext website protocols for ROBOTS, SECURITY, and HUMANS.

To create a sitemap, you simply make a plaintext list of each URL (uniform resource locator) for the website.  Only URLs for a single domain should be included — do not add URLs for subdomains or alias domains.  You should also only list canonical URLs.  This means that if a particular webpage can be accessed from multiple URLs, only one URL should be listed for that webpage in the sitemap.

For example, there are many different ways to access Nicole Sharp's Homepage:

https://www.nicolesharp.net/
https://www.nicolesharp.net/index.htm
https://www.nicolesharp.net/index.html
https://www.nicolesharp.net/w/
https://www.nicolesharp.net/w/index.php
https://www.nicolesharp.net/w/index.php?title=NikkiWiki
https://www.nicolesharp.net/w/index.php?title=Main_Page
https://www.nicolesharp.net/w/index.php?title=NikkiWiki:Main_Page
https://www.nicolesharp.net/wiki/
https://www.nicolesharp.net/wiki/NikkiWiki
https://www.nicolesharp.net/wiki/Main_Page
https://www.nicolesharp.net/wiki/index

The canonical URL though is

https://www.nicolesharp.net/wiki/NikkiWiki

since all of the other URLs redirect to that URL.

Here are even more ways to access Nicole Sharp's Homepage:

https://nicolesharp.net/
https://www.nicolesharp.net/
https://web.nicolesharp.net/
https://en.nicolesharp.net/
https://eng.nicolesharp.net/
https://us.nicolesharp.net/
https://usa.nicolesharp.net/
https://wiki.nicolesharp.net/
https://w.nicolesharp.net/
http://www.nicolesharp.net/
http://nicolesharp.net/
http://nicolesharp.altervista.org/
http://nicolesharp.dreamhosters.com/
https://nicolesharp.dreamhosters.com/

With the exception of "https://www.nicolesharp.net/", none of these other URLs should be included in "https://www.nicolesharp.net/sitemap.txt".  All of the URLs should have the same protocol (either all HTTPS [Hypertext Transfer Protocol Secure] or all HTTP [Hypertext Transfer Protocol]) and all of the URLs should be on the same subdomain (for example, either all with "www" or all without "www").

The following "/sitemap.txt" example gives a compliant sitemap for "https://www.nicolesharp.net/":

https://www.nicolesharp.net/wiki/NikkiWiki
https://www.nicolesharp.net/wiki/about_NikkiWiki
https://www.nicolesharp.net/wiki/Nicole_Sharp
https://www.nicolesharp.net/wiki/license_for_Nicole_Sharp's_Website
https://www.nicolesharp.net/wiki/analytics_for_Nicole_Sharp's_Website
https://www.nicolesharp.net/wiki/donations
https://www.nicolesharp.net/wiki/security

Only canonical URLs are included, all of the URLs have the same protocol ("https://"), and all of the URLs are on the same subdomain ("www.nicolesharp.net").  Each new subdomain will need its own sitemap.

Once your sitemap is completed, you can add it to the Robots Exclusion Protocol to be indexed by searchbots.  An example "/robots.txt" with a sitemap is given below.

User-agent: *
Disallow:
Sitemap: https://www.example.net/sitemap.txt

see also

keywords

bots, development, indexing, ROBOTS, robots.txt, searchbots, SITEMAP, sitemap.txt, TXT, web, webcrawlers, webcrawling, webdevelopment, WWW