Google Crawler Documentation Adds HTTP Caching

Dec 10, 2024 - 7:51 am 0 by
Filed Under Google

Google Cache

Google has updated its crawler help documentation to add a new section for HTTP caching, which explains how Google's crawlers handle cache control headers. Google also posted a blog post begging us to let Google cache our pages.

Begging might be too much, but Gary Illyes wrote, "Allow us to cache, pretty please" as the first line of the blog post. He then said we allow Google to cache our content today than we did 10 years go. Gary wrote, "the number of requests that can be returned from local caches has decreased: 10 years ago about 0.026% of the total fetches were cacheable, which is already not that impressive; today that number is 0.017%."

Google added an HTTP Caching section to the help document to explain how Google handles cache control headers. Google's crawling infrastructure supports heuristic HTTP caching as defined by the HTTP caching standard, specifically through the ETag response- and If-None-Match request header, and the Last-Modified response- and If-Modified-Since request header.

If both ETag and Last-Modified response header fields are present in the HTTP response, Google's crawlers use the ETag value as required by the HTTP standard. For Google's crawlers specifically, we recommend using ETag instead of the Last-Modified header to indicate caching preference as ETag doesn't have date formatting issues. Other HTTP caching directives aren't supported, Google added.

I should add that Google and Bing both have supported ETag at least since 2018.

Google added a bunch more detail to that section but also expanded this section of the page:

Google's crawlers and fetchers support HTTP/1.1 and HTTP/2. The crawlers will use the protocol version that provides the best crawling performance and may switch protocols between crawling sessions depending on previous crawling statistics. The default protocol version used by Google's crawlers is HTTP/1.1; crawling over HTTP/2 may save computing resources (for example, CPU, RAM) for your site and Googlebot, but otherwise there's no Google-product specific benefit to the site (for example, no ranking boost in Google Search). To opt out from crawling over HTTP/2, instruct the server that's hosting your site to respond with a 421 HTTP status code when Google attempts to access your site over HTTP/2. If that's not feasible, you can send a message to the Crawling team (however this solution is temporary). Google's crawler infrastructure also supports crawling through FTP (as defined by RFC959 and its updates) and FTPS (as defined by RFC4217 and its updates), however crawling through these protocols is rare.

Also, later Gary Illyes from Google explicitly said there is no ranking boost. When asked on LinkedIn, "Is there also a ranking signal relative to a site having ETag caching and/or extended cache periods? Assuming at some point it may be a signal in Lighthouse reporting too." Gary responded "no there isn't."

Forum discussion at X.

 

Popular Categories

The Pulse of the search community

Search Video Recaps

 
- YouTube
Video Details More Videos Subscribe to Videos

Most Recent Articles

Search Forum Recap

Daily Search Forum Recap: January 10, 2025

Jan 10, 2025 - 10:00 am
Search Video Recaps

Search News Buzz Video Recap: Google Local Ranking Bug, Reddit Drops, Microsoft Fools Searchers & More SEO, PPC & Local

Jan 10, 2025 - 8:01 am
Google Search Engine Optimization

Google: URLs Don't Need To Be Lower Case For Search

Jan 10, 2025 - 7:51 am
Google Ads

Google Local Services Ads For Tax Specialist Drops Auto Credits & Features

Jan 10, 2025 - 7:41 am
Google Ads

Google Ads Updates Political Content Policy For CA Residents For Deepfakes

Jan 10, 2025 - 7:31 am
Google

Google Search Product Detail View Tests Share Icons

Jan 10, 2025 - 7:21 am
Previous Story: Google: Sometimes Over Optimization Drift Towards SEO Spam