Technically curious

Why should I care about a Robots.txt file on my website?

robot comic

For a couple of reasons actually.

From Google: “A robots.txt file tells search engine crawlers which URLs the crawler can access on your site. This is used mainly to avoid overloading your site with requests; it is not a mechanism for keeping a web page out of Google. To keep a web page out of Google, block indexing with noindex or password-protect the page.”

The second reason would be that Google requires it for proper Search Console indexing and functionality. You will get errors in indexing/crawling your pages/URLs if you don’t have the sitemap added to it properly.

It’s ok if this doesn’t make sense to you. You can learn more at this Google page that talks about best practices for Robots.txt files.

Google has a page on how to update your robots.txt file. Just click on the previous link.

In its FAQ Google says you don’t need to have a robots.txt file but that is a lie. Many Google Search Console users only fixed their problems when they had a robots.txt file properly added with the sitemap.xml reference. They apparently don’t update their developer documentation on a timely basis just like Microsoft does.

There are other minor things in that article that aren’t correct either. It talks about the crawl rate but that is now ignored. Don’t take any source of information as infallible even the owner of that documentation. It goes back to what I said before. Often information that is supposed to be authoritative isn’t correct simply because the volume of changes in information is faster than companies care to correct it.

See also  Another step on my path to solar independence

Why am I talking about this? I spoke about sitemap.xml and both work hand in hand. If you care enough to make your site work as well as possible, doing this is another step that needs to be done.