Google held its first Google Webmaster Conference Product Summit at Mountain View, California - the GooglePlex. It was a full day event, unlike most of the previous Google Webmaster Conferences and it was filled with awesomeness. I posted some of my favorite photos and I will be posted at least three videos that you won't want to miss of it on my YouTube channel but here are my takeaways from the event - things I thought stood out to me.
(Note: The image above is a photo I took of Vanessa Fox, who was the individual who created Webmaster Tools at Google. She is now running Keylime Toolbox. It was great catching up with her yesterday.)
First off - I have to say that this event was not only awesome for education but for dialog between the Google Search teams and the publishing, developer, SEO, etc community. I said this before, but these types of events show Google cares.
Also, there are some outstanding notes of the event by Jackie Chu on her blog, so check those out as well.
Deduplication is complicated and you can help Google figure this stuff out. Here are some tweets with this coverage:
Deduplication is surprisingly complicated and cool. It was neat to have one of the engineers talk about it.
β π John π (@JohnMu) November 4, 2019
Web deduplication. Interesting. Identify and cluster dup pages. Pick representative. urls, index unique pages. Forward signals to representative pages. #gwcps
β Glenn Gabe (@glenngabe) November 4, 2019
Canonicalization: Final tips: Use redirects, send the right http results, check rel canonical, use hreflang for localization, report hijacking, keep canonical signals unambiguous. #gwcps
β Glenn Gabe (@glenngabe) November 4, 2019
Canonicalization: Google must pick a canonical. Main thing is to avoid hijacking. Escalations via WTA forums. Second concern is user experience. Third, webmaster signals: redirects, rel canonical, sitemaps. #gwcps
β Glenn Gabe (@glenngabe) November 4, 2019
Fetching and Crawling data were interesting:
Rendering: Fetching Problems - Limited access (robots.txt). Limited crawl volume (don't want to overload your server). e.g. 50-60 fetches per page. 60-70% cache hit rate. About 20 fetches per page, or 20X #gwcps
β Glenn Gabe (@glenngabe) November 4, 2019
Don't rely on caching for fetching:
Fetching. Corners will be cut. Don't rely on clever caching tricks. #gwcps pic.twitter.com/3sNuH3tZSp
β Jennifer Slegg (@jenstar) November 4, 2019
Rendering: Did you know one way Google has issues with rendering JavaScript were some of the Cryptocurrency miners? The stuff they used would overload Google's rendering engine and impact indexing, I think Google said.
FYI Crypto miners will break your rendering #gcwps pic.twitter.com/I3iNXaTPjx
β MyCool King (@iPullRank) November 4, 2019
Robots.txt unreachable, Google won't crawl the site:
If the robots.txt is "unreachable" they won't crawl the whole site #gwcps
β MyCool King (@iPullRank) November 4, 2019
And Google can't reach robots.txt often enough, look at this data:
#gwcps pic.twitter.com/S1bB6obXEz
β MyCool King (@iPullRank) November 4, 2019
Seriously? One out of four times googlebot cannot reach a siteβs robots.txt? π€― then they wonβt crawl the entire site!! #gwcps pic.twitter.com/wC49yC40zI
β Raffaele Asquer (@raffasquer) November 4, 2019
Googlebot and Crawling: For robots.txt, Google sees 200 code 69% of time, 5% transient (ok), and 26% unreachable. Wow, that's high. #gwcps
β Glenn Gabe (@glenngabe) November 4, 2019
HTTPS vs HTTP traffic for Google:
HTTPS has really grown significantly in a few years, but there's still a bunch of room. Use secure protocols, folks! #gwcps pic.twitter.com/DaZDAltuZF
β π John π (@JohnMu) November 4, 2019
75% of Googlebot crawling is HTTPS. #gwcps
β Jennifer Slegg (@jenstar) November 4, 2019
Emojis are a big deal, they get over one million searches per day and adding support to index and search for emojis took Google a year:
Google sees over one million searches on emojis per day π€― and it took Google a year to add the ability to search for emojis in search. #gwcps
β Barry Schwartz (@rustybrick) November 4, 2019
Google search understanding synonyms:
I love this slide from @haahr on how context is so important in Google Search figuring out the *right* synonyms. #GWCPS pic.twitter.com/oKuMNUNjS7
β Danny Sullivan (@dannysullivan) November 4, 2019
Another case study: Does a query for [new york hotels] return good results for [york hotels]? Google built something into its IR system to ignore "new" if there's a search for only york hotels. #gwcps
β Glenn Gabe (@glenngabe) November 4, 2019
Google does remember:
Wow -> Does google hang onto history of a URL? ie. page originally had crap content. Google tries to judge thing as they are, but reputation of a site and page is often based on historical behavior. #gwcps
β Jennifer Slegg (@jenstar) November 4, 2019
A googler said historical data matters. π IDK, but I think that's super important to keep in mind. #gwcps
β Jason Dodge (@dodgejd) November 4, 2019
Regarding if Google looks at the history of urls, Google: We try to look at the site/page in its current form. But the reputation of site and pages is based on historical data. E.g. old links, people talking about sites, etc. #gwcps
β Glenn Gabe (@glenngabe) November 4, 2019
With every Google search change there are wins and losses (in terms of pros and cons that Google sees in terms of quality):
Google... every change has wins and losses. #gwcps
β Jennifer Slegg (@jenstar) November 4, 2019
Google will use BERT more (what that means can be a ton of things and this was not clarified):
Google... every change has wins and losses. #gwcps
β Jennifer Slegg (@jenstar) November 4, 2019
Machine learning is also used for canonicalization:
Whoa.... weights for canonicalization are trained with machine learning. #gwcps
β Jennifer Slegg (@jenstar) November 4, 2019
Pages that jump as they load is something Google dislikes, will there be a penalty or does this impact pages in Google's core updates? That I do not know but see these tweets:
Things around content jumping around, tons of popups, interstitial... these are not great content experiences. #gwcps
β Jennifer Slegg (@jenstar) November 5, 2019
Iβm getting the vibe that David Besbris has recently had some terrible experiences with pages jumping around on him.. #gwcps
β Jake Bohall (@jakebohall) November 5, 2019
That is. When web pages jump up and down.
β Barry Schwartz (@rustybrick) November 5, 2019
Those were the points that stood out to me from the day but you can see all the tweets at the hashtag #hashtag. I am sure I missed some tweets and perspectives but it was a busy day and a lot going on.
Also, I have video interviews coming out that you will not want to miss. I got to get to editing them, which takes a lot of time.
Forum discussion at the tweets above.