Google announced last night that it is looking to develop a complementary protocol to the 30-year-old robots.txt protocol. This is because of all the new generative AI technologies Google and other companies are releasing.
This announcement comes shortly after the news around Open AI accessing paywalled content for its ChatGPT service. But I know many of you are not surprised that Google and others are exploring alternatives to robots.txt with all this generative AI technology floating around the web.
Nothing is changing today, all Google announced was that in the "coming months" they will hold discussions with the "community" to come up with new ideas for a new solution.
Google wrote, "Today, we’re kicking off a public discussion, inviting members of the web and AI communities to weigh in on approaches to complementary protocols. We’d like a broad range of voices from across web publishers, civil society, academia and more fields from around the world to join the discussion, and we will be convening those interested in participating over the coming months."
Google added that it believes "it's time for the web and AI communities to explore additional machine-readable means for web publisher choice and control for emerging AI and research use cases."
What this all means right now, is, I don't know. But here are some responses to my tweet about it:
How about allowing regular expressions in robots.txt? I bet that would solve 75% of the crawl directive challenges SEOs run into.
— Eric Heiken (@EricHeiken) July 6, 2023
I think it works OK, although maybe after 30y it should become robots.xml or something since lots of stuff has been added, and structured file might be more prone to accidental errors
— Miloš Mileusnić (@mileusna) July 6, 2023
“Now that we’ve already trained our LLMs on all your proprietary and copyrighted content, we will finally start thinking about giving you a way to opt out of any of your future content for being used to make us rich.” https://t.co/dda8hHQPfq
— Barry Adams 📰 (@badams) July 6, 2023
Gary Illyes from Google, who worked on this protocol over the years, wrote on LinkedIn, "It's time. Nearly 30 years ago robots.txt was born and it served the internet well all this time. With the emerging AI technologies, we need to complement it with new instructions (rules) that were designed for AI applications specifically."
And John Mueller:
I'm excited to see this happening. https://t.co/UTdmeCVwhl
— John Mueller (official) · Not #30D (@JohnMu) July 6, 2023
Today, we’re kicking off a public discussion to explore a machine-readable means for web publisher choice and control for emerging AI & research use cases. Learn more on this effort, including how to join the discussion by signing up: https://t.co/iF9WNyhN3O
— Google SearchLiaison (@searchliaison) July 6, 2023
If you want to participate, fill out this form.
Do any of you have any ideas?
Forum discussion at Twitter.