Google has updated its URL structure guidelines to specify what characters Google Search supports in URLs.
The introduction paragraph now leads off by saying, "Google supports URLs as defined by RFC 3986. Characters defined by the standard as reserved must be percent encoded. Unreserved ASCII characters may be left in the non-encoded form. Additionally, characters in the non-ASCII range should be UTF-8 encoded."
Previously Google wrote, "A site's URL structure should be as simple as possible. Consider organizing your content so that URLs are constructed logically and in a manner that is most intelligible to humans."
Other changes in the document include these two additions to the "Resolve this problem" section. Here is what was added:
- Create a simple URL structure. Consider organizing your content so that URLs are constructed logically and in a manner that is most intelligible to humans.
- If upper and lower case text in a URL is treated the same by the web server, convert all text to the same case so it is easier for Google to determine that URLs reference the same page.
Those are the large changes I spotted with this document update.
Forum discussion at Twitter.