Google has announced a brand new robots tag it will obey going forward, it is named indexifembedded. It lets you control if you want Google to index a page with embedded content. This allows Google to index the content of a page if it's embedded in another page through iframes or similar HTML tags, in spite of a noindex directive. indexifembedded only has an effect if it's accompanied by noindex. Gary Illyes and Weizi Wang from Google wrote "we're introducing a new robots tag, indexifembedded, that brings you more control over when your content is indexed. With the indexifembedded tag, you can tell Google you'd still like your content indexed when it's embedded through iframes and similar HTML tags in other pages, even when the content page has the noindex tag."
In short, let's say I embed a piece of content using an iframe or some type of code embed method. That piece of content often, if it is media, has a directive to be noindexed, but when I embed it on my page that has more context around what I am embedding, it might tell Google that since the embedded content is noindexed, do not index the page I am embedding it on. Here, Google is giving you more control to say, index the page the embed is on despite what the embed page says. It reminds me of Google's warning around embedding Instagram or other images and the SEO issues that can cause.
Google said that the indexifembedded tag "addresses a common issue that especially affects media publishers: while they may want their content indexed when it's embedded on third-party pages, they don't necessarily want their media pages indexed on their own. Because they don't want the media pages indexed, they currently use a noindex tag in such pages. However, the noindex tag also prevents embedding the content in other pages during indexing."
The new robots tag, indexifembedded, Google said "works in combination with the noindex tag only when the page with noindex is embedded into another page through an iframe or similar HTML tag, like object."
The example Google gave was with a podcast, if podcast.host.example/playpage?podcast=12345 has both the noindex and indexifembedded tag, it means Google can embed the content hosted on that page in recipe.site.example/my-recipes.html during indexing. I assume the same would apply to Instagram and other embeds.
So if Instagram would implement this new indexifembedded it might solve that issue, maybe?
Here are the code examples both in meta robots and x-robots form:
I did ask Google for a bit more clarification:
I read it as controlling indexing the embedded page, not the page embedding it on
— Pascal Birchler (@swissspidy) January 21, 2022
I just don't see why Instagram would care to implement this - why do they care how embedding their content on my site impacts if my pages are indexed? They want the traffic directly, they do not want me to rank above them.
A "common" (it's new, so there's nothing common yet :)) use-case would be widgets or embedded content, where you have a special URL for the embed that you don't want indexed, but you still want to allow the embedding page to use it for indexing. Eg, video embeds.
— 🐄 John 🐄 (@JohnMu) January 21, 2022
Forum discussion at Twitter.