Matt Rogerson, the Director of public policy at Guardian Media Group, posted a Twitter thread dissecting why he believes the claims from The Sun and Mail Online about Google Search mistreating them, are false. He then spent time going through why Google does not rank them well - and it might be heavily around author bylines.
Matt referenced this story on the Press Gazette named The Sun and Mail Online believe they aren't getting fair share of Google search traffic. Matt then references at least two reasons why The Sun and Mail Online don't rank as well in Google Search by The Guardian and other publications. Those reasons are:
(1) Speed issues (2) Issues with original, in-depth, and investigative reporting
The speed issues are easy and I personally doubt it is the main reason why any news organization would not see nice rankings in Google Search. But here is his tweet on that:
6/ I ran them again in Sept ‘21 adding the @Telegraph.
— Matt Rogerson (@MattRogerson) November 8, 2021
Green is good, red is bad.
We know that Google cites these factors as significant in how sites rank in search, so these data are objective reasons why other one publisher might fare better in search than another. pic.twitter.com/a6dRshYuh5
It is the original, in-depth, and investigative reporting issues that I found most revealing. In short, it seems like the majority of the news stories The Daily Mail posts are without named authors, named bylines. Matt cites the search quality raters guidelines:
8/ @Google may not be as successful in this regard as publishers would like (See the Sun’s concerns in the article in 1/), but one outcome is likely to be that sites that rely heavily on news agency content will see their search performance suffer.
— Matt Rogerson (@MattRogerson) November 8, 2021
Then he digs into categories of stories and who wrote those stories. Here is how he did it:
10/ I’ve labelled the data where there was a ‘named byline’, ‘generic’ (i.e. DM reporter), ‘Agency’ copy (eg. Reuters, AFP), and Unknown (Total articles minus articles where no author info is available).
— Matt Rogerson (@MattRogerson) November 8, 2021
Here is what he came up with:
12/ On the “Covid” search term, nearly 60% of content is credited to an agency, with just over 3% by a named byline.https://t.co/HUC90vrC5e pic.twitter.com/8UZNDx1UNt
— Matt Rogerson (@MattRogerson) November 8, 2021
14/ In our paper to @Ofcom, I noted areas where the DM excels in search, “Kim Kardashian” being a case in point.
— Matt Rogerson (@MattRogerson) November 8, 2021
Agency copy accounts for just 3% here, with generic or named bylines comprising 20%.
A big difference to those first 3 terms.https://t.co/Xpugc4LjuU pic.twitter.com/PI5IgWXmjN
15/ If DM’s own search function is correct, + taking into account a % of authors aren’t categorised, the data suggests that of the c4.8 million articles in the DM archive, more than 50% come directly from news agencies. pic.twitter.com/EvNndKIsSE
— Matt Rogerson (@MattRogerson) November 8, 2021
Look at how small the "named byline" slice of the pie is. I am not sure how it compares to The Guardian or other publications but honestly, I am shocked to see this number to be so low.
His conclusion:
17/ … Objective data suggests that factors other than subjectivity may play in part in how @Google ranks the DM compared to other publishers.
— Matt Rogerson (@MattRogerson) November 8, 2021
But I’d love thoughts from more learned experts than me. I don’t claim to be an expert!!
What do you all think? Is this the power of bylines or just that this is one signal that the content was not "original, in-depth, and investigative reporting."
Forum discussion at Twitter.
Update: Here is a statement from a MailOnline Spokesperson.
Unfortunately the research in Matt Rogerson's tweets contains a fatal flaw.
He used MailOnline's onsite search function and found 4.8m URLs, of which 50% were wire stories. He assumes this has a detrimental effect on MailOnline’s Google search performance.”
What Matt does not know, because he did not approach us for comment, is that MailOnline blocks Google from indexing agency stories. It’s done automatically in the pages’ code which Google reads. The official term is “apply the noindex value for the meta robots tag to avoid Google indexing this piece of content”.
This means Google does not consider wire stories on MailOnline for ranking. Therefore those stories have zero effect on Google’s ranking of MailOnline original bylined content, which of course is not blocked.
The same applies to the category Matt describes as ‘generic’, by which we believe he means picture galleries, which carry a generic ‘MailOnline Reporter’ byline. The majority of these are blocked from indexing by Google and therefore have zero effect on search ranking.
Another category which may not carry a byline is video. This is indexed by Google, but performs well in search, so does not support Matt’s case.
Finally there is a large category which Matt describes as ‘unknown’. He describes this as ‘total articles minus articles where no author info is available’. We presume Matt means MailOnline author info, because in fact some wire stories also carry bylines. If you subtract the number of articles for which no MailOnline author info is available from the total of all articles, you are left with MailOnline bylined articles. But Matt has already listed articles with a MailOnline named byline separately. If you take all articles then subtract those for which MailOnline author info IS available, you are left with wire stories, galleries and some video – but Matt has also listed them separately so can’t count them again. Whatever Matt means by ‘unknown’ it would appear he has been double counting.
For all these reasons we are afraid Matt’s research has no validity and provides no explanation for why the Guardian performs so much better than other news sites in Google’s search rankings.