Google uses its search engine data and signals to help train its Gemini AI models, we learned from documents from the DOJ deposition testimony. Although, we kind of knew that when Google launched SGE, they told us they used Google Search's core ranking systems to help identify high-quality sites (although, it didn't seem like they did...).
This was written by The Information, which is paywalled, but Glenn Gabe quote it and posted it on X - he said:
Deposition testimony from a Google engineer. Check the part about search signals and upweighting authoritative pages. Also, AIO model training -> Google Used Search Data To Train AI Models"In a separate internal email relating to training Google’s Gemini model, a Google employee wrote that search “signals will be very helpful for us to upweight good authoritative pages and downweight the spammy untrustable ones.”
"The lawyer, Karl Herman, also showed deposition testimony from Google senior director of engineering Phiroze Parakh, who said that search data was used to pretrain the model that generates the AI Overviews feature in Google Search and that user feedback data was used to train the model that decides whether to trigger that feature in response to search queries."
Again, this seems to go with what Google told us during the early days of its AI efforts... But it is nice to hear in testimony.
I mean, it does make sense for them to use this data to improve the quality and accuracy of its AI responses, both in Search's AI Overviews, AI Mode and Gemini.
Here is Glenn's post:
Deposition testimony from a Google engineer. Check the part about search signals and upweighting authoritative pages. Also, AIO model training -> Google Used Search Data To Train AI Models
— Glenn Gabe (@glenngabe) April 21, 2025
"In a separate internal email relating to training Google’s Gemini model, a Google… pic.twitter.com/7tjOvxL5J0
Forum discussion at X.