Google is reportedly working on fixing a bug with not being able to properly respond to a noindex directive on some JavaScript generated pages. This is happening for some React apps using Single Page Applications (SPAs) that have the noindex directive on them but Google is not picking it up and thus indexing pages that should not be indexed.
This issue was spotted by Mark Williams-Cook, the Director at Candour and Founder of AlsoAsked, a popular SEO tool. He posted about the issue on LinkedIn and said, "Here's a screenshot of over 9,000 "noindex" pages being indexed. Adding 'noindex' via JS can be a solution, but it's absolutely not reliable."
Here is that screenshot from Google Search Console's indexing report:
He later shared that Google is now aware of the issue and is working on fixing it. "I spoke to Googlers about this and it is a bug they are working on fixing," he wrote.
Mark went on to explain that this is an example of a React app that has a meta "noindex" added via JavaScript, but the pages are nevertheless getting indexed.
Martin Splitt from Google did speak in the past about noindex meta and JavaScript pages and sometimes, Google does have a hard to processing them. But these days, Google shouldn't be challenged by them. I guess in this case, Google was?
Again, I suspect in this case, Google fetched and rendered the page prior to the JavaScript executing the noindex meta data. So Googlebot didn't see the noindex directive yet. This can happen, it is rare, but this can happen and obviously did happen. Google has warned about this before, including using JavaScript to generate structured data in some cases.
Mark's solution was to use CloudFlare to serve the noindex directive prior to the JavaScript executing its own meta data (which is served in the HTTP header), but that was only after he found out Google was not picking up on the first solution he had. Mark wrote, "With some Single Page Applications (SPAs), it can be difficult to have control over what is shown before the JS is executed. In this instance, I managed to get the pages to set a "noindex" before JS was rendered by using Cloudflare Transform rules."
So if you noticed this issue as well, hopefully Google will patch it up and if not, look for an alternative solution - like Mark did.
Forum discussion at LinkedIn.
Update October 2nd: Martin Splitt from Google said on October 2nd on LinkedIn:
We can render just fine, but it introduces variables that can increase complexity and together with the creativity of the people making websites that can invite trouble sometimes. That being said, most of the time Javascript is blamed for a problem it turns out not to be the troublemaker in the end.Sometimes we also have bugs in our code and recently one of these bugs actually did involve Javascript, so it's not impossible for Javascript to be involved in problems tho.
Update 3: Martin Splitt explained the bug months later in this video at about 9 minutes in this video presentation: