URL Normalization: Is a Trailing Slash the Same Page

Dec 28, 2004 - 3:00 pm 1 by

There is a very interesting thread brewing at Search Engine Watch Forums named Is A Trailing / On A Directory Seen As A Differnet File By Google?. In this thread a member lists an example of the same page, different URLs due to the trailing slash, have different PageRank values. His example is:

http://www.avismauritius.com/en/locations/ PR=3 http://www.avismauritius.com/en/locations PR=0

In the thread, Orion, the resident search technology guru at SEW forums, discusses how search engines normalize the URLs in order to give each URL a unique identifier. I hope that I explain this correctly. It is my understanding that the unique identifier is a hash string, possibly a 64 or 128 bit hash string. In order to assign a unique identifier, the URL needs to be stripped down and normalized. The process is a bit like Orion stated:

Removal of the protocol prefix (http://) if present Removal of a :80 port number specification if present (However, non-standard port number specifications are retained) Conversion of the server name to lower case Removal of all trailing slashes ("/")

However, this does not really explain if Google does all or some or none of this. Moderator Chris_D referenced an old WebmasterWorld thread where GoogleGuy sheds some more light on this topic. He talks a lot about http responses and URL requests, but the important line to get out of the thread is "I would always recommend the trailing slash. If you know the exact right url, it's often best to give it directly and save everyone that extra redirect." You also might want to check out msg # 6 in that thread.

PageOneResults from the SEO Consultants Directory explains that this is more of a matter of "content negotiation". He goes on to explains;

The W3C and other large website structures are now utilizing content negotiation. That means that this...

www.example.com/sub

...could be different than this...

www.example.com/sub/

With the use of content negotiation, there are no file extensions. Basically you are cleaning the URI of all underlying identifying technologies.

Bottom line, the same URL with and without a trailing slash can and is considered different to most search engines. Most are weeded out through the use of duplicate content filters, and most sites do not have this problem because of the built in way the server handles these URL requests.

 

Popular Categories

The Pulse of the search community

Follow

Search Video Recaps

 
- YouTube
Video Details More Videos Subscribe to Videos

Most Recent Articles

Google

Google AI Overview Index Serving Delayed Compared To Web Results

Nov 21, 2024 - 7:51 am
Google

Page Annotation In Google iOS App Browser

Nov 21, 2024 - 7:41 am
Google News

DOJ: Force Google To Sell Chrome, Restrict Android & Ban Default Deals

Nov 21, 2024 - 7:31 am
Google

Google Sitelinks With Icons & Labels

Nov 21, 2024 - 7:21 am
Bing Search

Bing Search Tests Circle Shaped Favicons In Search Results

Nov 21, 2024 - 7:11 am
Search Forum Recap

Daily Search Forum Recap: November 20, 2024

Nov 20, 2024 - 10:00 am
Previous Story: OR Factor: Originality Factor