Google has made changes to a few of its Google search help documentation over the past couple of days. The documents updated include HTTP status codes, the Googlebot and job posting help documentation. Note, the HTTP status code aspect is not new, the content was just moved from one location to another.
Googlebot
On the Googlebot how many bytes of textual content, such as HTML, Googlebot will crawl specifically over here. Here is the new lines of text:
Googlebot can crawl the first 15MB of content in an HTML file or supported text-based file. After the first 15MB of the file, Googlebot stops crawling and only considers the first 15MB of content for indexing.
So you know, that is a lot of megabytes...
It's specific to the HTML file itself, like it's written. Embedded resources / content pulled in with IMG tags is not a part of the HTML file.
— 🐝 johnmu.csv (personal) 🐝 (@JohnMu) June 24, 2022
Google posted more on this 15MB thing over here.
Job postings
On the job postings, Google specified that when you use the jobLocation property, you must also include the addressCountry property.
HTTP Status Codes
The HTTP status codes document added a whole large section for 404 errors which was not there in the old version. Here is what has moved into this document:
FWIW the soft-404 docs were just moved, they're ... not new :-)
— 🐝 johnmu.csv (personal) 🐝 (@JohnMu) June 23, 2022
soft 404 errors
A soft 404 error is when a URL that returns a page telling the user that the page does not exist and also a 200 (success) status code. In some cases, it might be a page with no main content or empty page. Such pages may be generated for various reasons by your website's web server or content management system, or the user's browser. For example:- A missing server-side include file.
- A broken connection to the database.
- An empty internal search result page.
- An unloaded or otherwise missing JavaScript file.
When Google's algorithms detect that the page is actually an error page based on its content, Search Console will show a soft 404 error in the site's Index Coverage report.
Fix soft 404 errors
Depending on the state of the page and the desired outcome, you can solve soft 404 errors in multiple ways:- The page and content are no longer available.
- The page or content is now somewhere else.
- The page and content still exist.
The page and content are no longer available
If you removed the page and there's no replacement page on your site with similar content, return a 404 (not found) or 410 (gone) response (status) code for the page. These status codes indicate to search engines that the page doesn't exist and the content should not be indexed.If you have access to your server's configuration files, you can make these error pages useful to users by customizing them. A good custom 404 page helps people find the information they're looking for, and also provides other helpful content that encourages people to explore your site further. Here are some tips for designing a useful custom 404 page:
- Tell visitors clearly that the page they're looking for can't be found. Use language that is friendly and inviting.
- Make sure your 404 page has the same look and feel (including navigation) as the rest of your site.
- Consider adding links to your most popular articles or posts, as well as a link to your site's home page.
- Think about providing a way for users to report a broken link.
The page or content is now somewhere else
If your page has moved or has a clear replacement on your site, return a 301 (permanent redirect) to redirect the user. This will not interrupt their browsing experience and it's also a great way to tell search engines about the new location of the page. Use the URL Inspection tool to verify whether your URL is actually returning the correct code.The page and content still exist
If an otherwise good page was flagged with a soft 404 error, it's likely it didn't load properly for Googlebot, it was missing critical resources, or it displayed a prominent error message during rendering. Use the URL Inspection tool to examine the rendered content and the returned HTTP code. If the rendered page is blank, nearly blank, or the content has an error message, it could be that your page references many resources that can't be loaded (images, scripts, and other non-textual elements), which can be interpreted as a soft 404. Reasons that resources can't be loaded include blocked resources (blocked by robots.txt), having too many resources on a page, various server errors, or slow loading or very large resources. Hat tip on this from Kenichi Suzuki on Twitter.Those are the changes spotted in the past couple days to Google's help documentation.
Forum discussion at Twitter.