Google Crawler Documentation Adds HTTP Caching

Dec 10, 2024 - 7:51 am 0 by
Filed Under Google

Google Cache

Google has updated its crawler help documentation to add a new section for HTTP caching, which explains how Google's crawlers handle cache control headers. Google also posted a blog post begging us to let Google cache our pages.

Begging might be too much, but Gary Illyes wrote, "Allow us to cache, pretty please" as the first line of the blog post. He then said we allow Google to cache our content today than we did 10 years go. Gary wrote, "the number of requests that can be returned from local caches has decreased: 10 years ago about 0.026% of the total fetches were cacheable, which is already not that impressive; today that number is 0.017%."

Google added an HTTP Caching section to the help document to explain how Google handles cache control headers. Google's crawling infrastructure supports heuristic HTTP caching as defined by the HTTP caching standard, specifically through the ETag response- and If-None-Match request header, and the Last-Modified response- and If-Modified-Since request header.

If both ETag and Last-Modified response header fields are present in the HTTP response, Google's crawlers use the ETag value as required by the HTTP standard. For Google's crawlers specifically, we recommend using ETag instead of the Last-Modified header to indicate caching preference as ETag doesn't have date formatting issues. Other HTTP caching directives aren't supported, Google added.

I should add that Google and Bing both have supported ETag at least since 2018.

Google added a bunch more detail to that section but also expanded this section of the page:

Google's crawlers and fetchers support HTTP/1.1 and HTTP/2. The crawlers will use the protocol version that provides the best crawling performance and may switch protocols between crawling sessions depending on previous crawling statistics. The default protocol version used by Google's crawlers is HTTP/1.1; crawling over HTTP/2 may save computing resources (for example, CPU, RAM) for your site and Googlebot, but otherwise there's no Google-product specific benefit to the site (for example, no ranking boost in Google Search). To opt out from crawling over HTTP/2, instruct the server that's hosting your site to respond with a 421 HTTP status code when Google attempts to access your site over HTTP/2. If that's not feasible, you can send a message to the Crawling team (however this solution is temporary). Google's crawler infrastructure also supports crawling through FTP (as defined by RFC959 and its updates) and FTPS (as defined by RFC4217 and its updates), however crawling through these protocols is rare.

Also, later Gary Illyes from Google explicitly said there is no ranking boost. When asked on LinkedIn, "Is there also a ranking signal relative to a site having ETag caching and/or extended cache periods? Assuming at some point it may be a signal in Lighthouse reporting too." Gary responded "no there isn't."

Forum discussion at X.

 

Popular Categories

The Pulse of the search community

Search Video Recaps

 
Video Details More Videos Subscribe to Videos

Most Recent Articles

Search Forum Recap

Daily Search Forum Recap: January 8, 2025

Jan 8, 2025 - 10:00 am
Google Search Engine Optimization

Google On Losing Lots Of Links Fast: SEOs Often Overestimate Links

Jan 8, 2025 - 7:51 am
Google Ads

Google Tests Local Pack Ads Without Review Stars

Jan 8, 2025 - 7:41 am
Google Search Engine Optimization

Google Search Console Validate Fix Does Not Expedite Fixes

Jan 8, 2025 - 7:31 am
Google Ads

Google Ad Scheduling & Smart Bidding Campaigns Doc Update Confuses

Jan 8, 2025 - 7:21 am
Google Ads

Google Ads PMax Pause/Enable/Remove Conversion Actions On Asset Group Level

Jan 8, 2025 - 7:11 am
Previous Story: Google: Sometimes Over Optimization Drift Towards SEO Spam