Google has updated its crawler help documentation to add a new section for HTTP caching, which explains how Google's crawlers handle cache control headers. Google also posted a blog post begging us to let Google cache our pages.
Begging might be too much, but Gary Illyes wrote, "Allow us to cache, pretty please" as the first line of the blog post. He then said we allow Google to cache our content today than we did 10 years go. Gary wrote, "the number of requests that can be returned from local caches has decreased: 10 years ago about 0.026% of the total fetches were cacheable, which is already not that impressive; today that number is 0.017%."
Google added an HTTP Caching section to the help document to explain how Google handles cache control headers. Google's crawling infrastructure supports heuristic HTTP caching as defined by the HTTP caching standard, specifically through the ETag response- and If-None-Match request header, and the Last-Modified response- and If-Modified-Since request header.
If both ETag and Last-Modified response header fields are present in the HTTP response, Google's crawlers use the ETag value as required by the HTTP standard. For Google's crawlers specifically, we recommend using ETag instead of the Last-Modified header to indicate caching preference as ETag doesn't have date formatting issues. Other HTTP caching directives aren't supported, Google added.
I should add that Google and Bing both have supported ETag at least since 2018.
From Google: "Allow us to cache, pretty please. Caching is a critical piece of the large puzzle that is the internet. Caching allows pages to load lightning fast on revisits, it saves computing resources and thus also natural resources, and saves a tremendous amount of expensive… https://t.co/vQRmBpJvQd
— Glenn Gabe (@glenngabe) December 9, 2024
4/ Whats the impact on pagespeed
— Siddhesh SEO a/cc (@siddhesh_asawa) December 9, 2024
Google's crawlers that support caching will send the ETag value returned for a previous crawl of that URL in the If-None-Match header. If the ETag value sent by the crawler matches the current value the server generated, your server should return…
Google added a bunch more detail to that section but also expanded this section of the page:
Google's crawlers and fetchers support HTTP/1.1 and HTTP/2. The crawlers will use the protocol version that provides the best crawling performance and may switch protocols between crawling sessions depending on previous crawling statistics. The default protocol version used by Google's crawlers is HTTP/1.1; crawling over HTTP/2 may save computing resources (for example, CPU, RAM) for your site and Googlebot, but otherwise there's no Google-product specific benefit to the site (for example, no ranking boost in Google Search). To opt out from crawling over HTTP/2, instruct the server that's hosting your site to respond with a 421 HTTP status code when Google attempts to access your site over HTTP/2. If that's not feasible, you can send a message to the Crawling team (however this solution is temporary). Google's crawler infrastructure also supports crawling through FTP (as defined by RFC959 and its updates) and FTPS (as defined by RFC4217 and its updates), however crawling through these protocols is rare.
Also, later Gary Illyes from Google explicitly said there is no ranking boost. When asked on LinkedIn, "Is there also a ranking signal relative to a site having ETag caching and/or extended cache periods? Assuming at some point it may be a signal in Lighthouse reporting too." Gary responded "no there isn't."
Forum discussion at X.