Britney Muller spotted someone using Apache Nutch with a GoogleBot useragent name when crawling a site. Google has confirmed GoogleBot does not use Nutch in its useragent. Nutch is "highly extensible and scalable open source web crawler software project."
Here is what Britney posted:
Is Googlebot/Nutch-1.7 an official Googlebot crawler?
— Britney Muller (@BritneyMuller) October 1, 2020
Seeing some mixed info online (possibly a rarer large crawler for big websites)?🤷
IP DNS Lookups confirm that these are not from googlebot[dot]com or google[dot]com properties? cc: @JohnMu
Thanks!
John Mueller from Google confirmed Google does not use nutch at all:
I just double-checked to be sure :). We don't use "nutch" at all in any of the Googlebot user-agents we use for search or for the other uses of the shared infrastructure.
— 🍌 John 🍌 (@JohnMu) October 2, 2020
He said "We don't use "nutch" at all in any of the Googlebot user-agents we use for search or for the other uses of the shared infrastructure."
So if you see this with GoogleBot, it is not a real GoogleBot and you can block it if it is causing you issues.
Forum discussion at Twitter.