Earlier this week, we showcased an example of an SEO issue that Moz was having. Now John Mueller of Google has come into the Google Webmaster Help thread to offer his advice.
The interesting part is that it seems that this is Google trying to be smarter and not listen to the 301 redirect directives set in place by Moz.
GoogleBot learned that it can pick up and index content for the /blog/ URL structure by going to the /ugc/ structure. Despite the 301 redirects Moz has in place, it seems to be just taking the /ugc/ structure. John said this is very rare to see.
John explained:
In short, our systems learned that any content on /ugc/ can also be fetched as /blog/, so it would be resourceful to just pick one and use that.For example, taking the last posts on each side, all of these URLs lead to the content: https://moz.com/ugc/10-tips-for-optimizing-your-images-for-search https://moz.com/blog/10-tips-for-optimizing-your-images-for-search
https://moz.com/blog/roadmap-creating-share-worthy-content-massive-distribution https://moz.com/ugc/roadmap-creating-share-worthy-content-massive-distribution (they redirect, but both end up showing the same content)
Looking at 3892 unique URLs fetched from /ugc/ from the last 10 days or so, about half were discovered as /blog/ versions as well (so it's not just that they can exist, but also that they're referred to like that). Some of that is bound to be for historical reasons, I can see that. Regardless, this also reinforces the conclusion that these are essentially equivalent URL patterns.
What I'd recommend here is to clearly separate the two parts (if they're meant to be unique parts) and return 404 for invalid URLs.
So return 404s on the URLs that are 301ing. I am not sure if that would work because Moz may often upgrade /ugc/ content to /blog/ content and then they need the 301 to send the user to the right URL.
John then explains that this is a rare case and most webmasters do not have to worry about this:
For sake of context, I have seen this kind of issue come up on sites that are very similar or have very similar sections, but it's really rare (maybe a handful of cases over the years?). For the most part, many sites (I trust this one doesn't fit into this group!) are just sloppily programmed and have broad URL rewriting patterns that make this logic on our side extremely helpful for crawling. This is also a part of the reason why we recommend sticking with parameters in the URL unless you really know what you're doing with URL rewriting, it's easier to deal with URL parameters than it is to deal with confusing or redundant path / file names.
This is one of those interesting cases that are awesome to document.
Forum discussion at Google Webmaster Help.