I mentioned this before, but did you know that Bing has been using BERT since April of this year. That is about 6-months prior to when Google started using it. Also, Bing said in this blog post that BERT like "models are now applied to every Bing search query globally." Google only applies it to English queries in the US right now and only of that, 10% of those queries, and all featured snippets.
Here is what Bing said about its use of BERT models:
Starting from April of this year, we used large transformer models to deliver the largest quality improvements to our Bing customers in the past year. For example, in the query "what can aggravate a concussion", the word "aggravate" indicates the user wants to learn about actions to be taken after a concussion and not about causes or symptoms. Our search powered by these models can now understand the user intent and deliver a more useful result. More importantly, these models are now applied to every Bing search query globally making Bing results more relevant and intelligent.
The blog post then gets into more technical details around computer processing and machine learning that is above my head.
But here is the tweet from Frederic Debut from Bing:
Our journey to scale large language models (like BERT) on @Azure GPUs to improve the quality of @Bing search results globally. https://t.co/gCP0olIwmg
— Frédéric Dubut (@CoperniX) November 18, 2019
So yea, take that Google - Bing beat you by 6-months. :)
Forum discussion at Twitter.