There is some speculation in certain SEO groups and forums that Google has launched a new algorithm that is better than BERT and RankBrain named SMITH. SMITH stands for Siamese Multi-depth Transformer-based Hierarchical (SMITH) Encoder. This is not live, it is currently just a research paper from Google. Danny Sullivan from Google confirmed this for us on Twitter saying "No. We did not" launch SMITH in production.
Here are those tweets:
We publish a lot of papers about things not used in Search. I won't be making a habit of confirming each one someone might speculate about because it's time consuming & more important, we have tended to proactively talk about this stuff already. That said. No. We did not.
— Danny Sullivan (@dannysullivan) January 13, 2021
The speculation does not come from Roger Montti who wrote about the research paper. He just covered the recently published a research paper but he did not say it is in production use. In fact, Roger wrote that it would be "purely speculative to say whether or not it is in use." The paper was first submitted on April 26, 2020 and then version two was published on October 13, 2020.
I believe the speculation comes from some Black Hat World forum threads where some are seeing ranking changes and claiming it has to do with SMITH. Google has never said it launched SMITH in production search yet.
What is SMITH? Here is the abstract below but it seems like SMITH improves on BERT where it can understand language more in "long-form document matching" versus "short text like a few sentences or one paragraph" where BERT shines.
Many natural language processing and information retrieval problems can be formalized as the task of semantic matching. Existing work in this area has been largely focused on matching between short texts (e.g., question answering), or between a short and a long text (e.g., ad-hoc retrieval). Semantic matching between long-form documents, which has many important applications like news recommendation, related article recommendation and document clustering, is relatively less explored and needs more research effort. In recent years, self-attention based models like Transformers and BERT have achieved state-of-the-art performance in the task of text matching. These models, however, are still limited to short text like a few sentences or one paragraph due to the quadratic computational complexity of self-attention with respect to input text length. In this paper, we address the issue by proposing the Siamese Multi-depth Transformer-based Hierarchical (SMITH) Encoder for long-form document matching. Our model contains several innovations to adapt self-attention models for longer text input. We propose a transformer based hierarchical encoder to capture the document structure information. In order to better capture sentence level semantic relations within a document, we pre-train the model with a novel masked sentence block language modeling task in addition to the masked word language modeling task used by BERT. Our experimental results on several benchmark datasets for long-form document matching show that our proposed SMITH model outperforms the previous state-of-the-art models including hierarchical attention, multi-depth attention-based hierarchical recurrent neural network, and BERT. Comparing to BERT based baselines, our model is able to increase maximum input text length from 512 to 2048. We will open source a Wikipedia based benchmark dataset, code and a pre-trained checkpoint to accelerate future research on long-form document matching.
Roger wrote an article on what he thinks it is. Roger said "SMITH is a new model for trying to understand entire documents. Models such as BERT are trained to understand words within the context of sentences. In a very simplified description, the SMITH model is trained to understand passages within the context of the entire document." In fact, the Google researchers said SMITH increases the maximum input text length from 512 to 2048.
Folks in the forums are saying "Bert Smith update gone by yesterday," when talking about ranking changes on their site. Another said "Google’s new SMITH algorithm understands long form content better than BERT. Maybe this one is affecting to some site."
So no, there is no evidence that Google launched SMITH in production. And Google has confirmed that it did not launch SMITH in search.
And an old reminder, just because Google has a patent or research paper, it does not mean they are, have or will ever use it.
Yes, Danny Sullivan of Google said it in 2021:
We publish a lot of papers about things not used in Search. I won't be making a habit of confirming each one someone might speculate about because it's time consuming & more important, we have tended to proactively talk about this stuff already. That said. No. We did not.
— Danny Sullivan (@dannysullivan) January 13, 2021
Forum discussion at Black Hat World.