Forever, Google said to treat Google's crawlers, GoogleBot, as you would any searcher or user accessing your site from the United States. But now this is going to change because Google may crawl your site from IP addresses outside of the US and also may crawl your site with language settings outside of English-US.
Google announced they now support Locale-aware crawling by Googlebot - this is a huge change to internationalized sites.
What does this mean? In short, some sites that offer internationalized content do so without sending the user to a special URL. Google has always preferred you set up specific URLs or ccTLDs for content tailored to different countries or languages, but many sites just dynamically serve content on their .com based on the IP address origin or their browser language configurations.
Google is now going to support the sites that dynamically serve internationalized content based on IP or language. They will do this based on two methods:
- Geo-distributed crawling where Googlebot would start to use IP addresses that appear to be coming from outside the USA, in addition to the current IP addresses that appear to be from the USA that Googlebot currently uses.
- Language-dependent crawling where Googlebot would start to crawl with an Accept-Language HTTP header in the request.
Pierre Far, who was one of the Googlers assigned to this project, explained on Google+ a few points:
- We still (very strongly) recommend having separate URLs for different locales and using rel-alternate-hreflang annotation for them. Separate URLs are better for users, and that's what really counts. Locale-aware crawling is for the few edge cases where it's not possible for you to have separate URLs.
- It's early days and the countries that Googlebot will appear to come from and the Accept-Language headers it may try do not cover all combinations of countries and languages around the world. Also, we will continue to tweak things as we build out this feature. This is another reason to have separate URLs.
- Locale-aware crawling gets enabled algorithmically if we detect your site may benefit from it. You don't need to do anything
There is a lot of documentation on this over here but expect things to change slightly as webmasters discover how Google handles this and they report issues, confusion and problems in the forums.
Forum discussion at Google+.