Google has announced a major change to the way it’s web-crawler responds to directives within the robots.txt file.
Under the latest “Core Update“, Google will stop supporting robots.txt rules that are not published in the open-source Robots Exclusion Protocol Specification, including the “noindex” and “nofollow” directives.
Google is providing recommendations for how website operators should handle these changes:
- Noindex in meta tags. You should use
<meta name="robots" content="noindex">
on all pages that you previously listed in robots.txt with a noindex directive. - HTTP 404/410 status codes. Google will drop any page which returns one of these status codes. For instance, if a page legitimately no longer exists.
- Password Protection. Use markup to indicate subscription or paywalled content as Google will remove all password protected content that’s not legitimate subscription or paywalled content from indexing. Thus, password protecting content removes that content from Google.
- Disallow Directive. Search engines only index pages that they know about. Therefore you should use the disallow directive in robots.txt to indicate that Google cannot crawl that content.
- Search Console Remove URL tool. The tool is a quick and easy method to remove a URL temporarily from Google’s search results.
In conclusion, Google aims to make the internet a more open and simpler place by implementing these changes. In most cases, the end result will be a much simpler robots.txt file and an easier time for webmasters protecting content.
As always, BCSE is here and ready to assist with your SEO. If you need assistance with your e-Commerce or other website, SEO, or hosting please contact us today!