All The GoogleBot Robots.txt Specifications Changes

List Of All The GoogleBot Robots.txt Specifications Changes

Jul 2, 2019 - 8:01 am 2 — by Barry Schwartz

Filed Under Google Search Engine Optimization

Google Gate

With Google aiming to make the robots.txt exclusion protocol a standard, they proposed some changes and submitted them the other day. Now, Google updated their own developer docs around the robots.txt specification to match. Here is a list of what has changed.

Removed the "Requirements Language" section in this document because the language is Internet draft specific.
Robots.txt now accepts all URI-based protocols.
Google follows at least five redirect hops. Since there were no rules fetched yet, the redirects are followed for at least five hops and if no robots.txt is found, Google treats it as a 404 for the robots.txt. Handling of logical redirects for the robots.txt file based on HTML content that returns 2xx (frames, JavaScript, or meta refresh-type redirects) is discouraged and the content of the first page is used for finding applicable rules.
For 5xx, if the robots.txt is unreachable for more than 30 days, the last cached copy of the robots.txt is used, or if unavailable, Google assumes that there are no crawl restrictions.
Google treats unsuccessful requests or incomplete data as a server error.
"Records" are now called "lines" or "rules", as appropriate.
Google doesn't support the handling of elements with simple errors or typos (for example, "useragent" instead of "user-agent").
Google currently enforces a size limit of 500 kibibytes (KiB), and ignores content after that limit.
Updated formal syntax to be valid Augmented Backus-Naur Form (ABNF) per RFC5234 and to cover for UTF-8 characters in the robots.txt.
Updated the definition of "groups" to make it shorter and more to the point. Added an example for an empty group.
Removed references to the deprecated Ajax Crawling Scheme.

The big changes are (1) GoogleBot will follows 5 redirect hops (which we knew in 2014), (2) there are no crawl restrictions if unavailable is greater than 30 days, (3) unsuccessful requests=server error, (4) there is a 500 KiB size limit and (5) it supports URI-based protocols.

✒️Updated Google's Robots.txt spec to match REP draft✒️

🐰Follows 5 redirect hops
🕷️No crawl restrictions if unavailable >30 days
⚠️Unsuccessful requests=server error
🛑500 KiB size limit
💪Supports URI-based protocols

Full list of changes: https://t.co/GXd6FWt2D0 #robotstxt25
— Lizzi Harvey (@LizziHarvey) July 1, 2019

Here are some additional answers:

Correct. If there's none in the cache, then full allow us assumed
— Gary "鯨理" Illyes (@methode) July 1, 2019

That's correct
— Gary "鯨理" Illyes (@methode) July 1, 2019

Timer resets in each state change
— Gary "鯨理" Illyes (@methode) July 1, 2019

Forum discussion at Twitter.

List Of All The GoogleBot Robots.txt Specifications Changes

Barry Schwartz / Executive Editor

Popular Categories

The Pulse of the search community

Search Video Recaps

Most Recent Articles

Daily Search Forum Recap: April 1, 2025

Google Ranking Reddit AI Translated Pages Too Well

Google AI Overviews Tests Check Important Info Disclaimer

Bing Tests New Sitelinks Hover Shadow & Related Search Designs

Bing Copilot Answer Buttons To Image & Video Search

April 2025 Google Webmaster Report