Microsoft has published a new patent application named Web Content Reliability Classification (US 20230350956 A1). It seems this patent describes how to figure out a reliability score for a website or portion of the content on a website for use in Bing Search.
Of course, you need to keep in mind that just because a search company has a patent, it does not mean that patent is in use today or ever.
I am no patent writer, like the late great Bill Slawski, so I won't pretend to do that. But here is the abstract:
Technology described herein assigns a reliability score to web content, such as a web site or portion of a website. In one aspect, an output of the technology is a high reliability score and a low reliability score for a web content. The high reliability score represents conformance to high reliability sites, while the low reliability score represents conformance to low reliability sites. The high reliability score may be generated by first identifying high reliability online content within a compressed web graph. In a first iteration, the high reliability score of the seeds is used to score online content that is linked to the seed sites. At a high level, the more links that originate from high reliability sources, the higher the reliability score for the linked content. The low reliability score is similar, but uses outgoing links to low reliability sites instead of incoming links from high reliability sites.
Glenn Gabe spotted this and posted it on X:
Interesting patent -> Microsoft wants to confirm your sources
— Glenn Gabe (@glenngabe) November 6, 2023
"Microsoft wants to patent a system for “web content reliability classification” which “assigns a reliability score” to certain websites (or sections of a website)." https://t.co/tR49EGgvRt pic.twitter.com/sWLZxNAaz0
What can this reliability score do? "The reliability score can be used to block content, rank content, provide a content warning, and select a source to answer a question, along with other uses."
How does it determine if something is reliable? Here are some quotes:
- "Traffic data can indicate whether a source is popular, but popular is not the same thing as reliable.
- Natural language processing can be used to determine whether online content is grammatical, but grammatical is also not the same thing as reliable.
- The present technology identifies reliable content by leveraging expert scoring for a small amount of web content by iteratively extending these scores to other content based on how web content is linked.
- User interactions may also be leveraged.
This patent also talks about "seed sites" used to help determine what is reliable. "The high reliability score is generated by first identifying high reliability online content within a web graph. These initially scored sites may be described as seed sites. Ratings for the seed sites may be taken from authoritative lists of known reliable content providers," it says.
That is just a touch of this patent, hope you enjoy reading through it.
Forum discussion at X.