Unraveling URLs & Demystifying Domains

Feb 27, 2008 - 5:53 pm 0 by
Filed Under SMX West 2008

Can you find the same page on your site using different URLs? That might cause you duplicate content issues. Does your content management system put out parameters that block crawling? Own multiple domains pointing at the same site? Are you 301 redirecting them or leaving canonicalization to chance? Confused on even how to pronounce canonicalization, in addition to now being worried about it? Moderator: Detlev Johnson, CEO, Search Return Q&A Moderator: John Marshall, CTO and Founder, Market Motive Speakers: Brian Combs, Founder and Senior Vice President, Apogee Search Cindy Krum, Senior SEO Analyst, Blue Moon Works, Inc. Stephan Spencer, President, Netconcepts Navneet Virk, Director of Search Marketing, Roundarch Brian is up first. He is going first, because he feels he is the least technical person of the group. He is going to speak about the business aspects of domains. He starts off with the credentials of his company. He doesn't code anymore, but has been in this business for several years. * Webmaster versus Search Engine Optimizer Both have moved from sole operators to extreme specialization. Web Positions: Designer, programmer, copyright, HTML coder, project manager SEO Positions: SEO Copywriter, On-Page Optimizer, Link Builder, Link Baiter, Programmer, HTML Coder In general, you may need a specialist to deal just with issues about your domain. * Do it Right the First Time - Correcting errors is difficult and expensive. - Don't wait until late in a design or redesign to address SEO issues. - 301s don't necessarily pass all link juice. If you can avoid needing to use a 301, it may be better. - A "static" URL may be easier to maintain. Dynamic URLs are easier to index these days, but making them look static can still help. - Refer to directories, not files, to avoid file extension changes if you change from .net to .asp, for example. - Avoid frames and Flash-based sites. Locational flash is fine, but frames are a tool of the devil. * URL Structures Use a single URL structure. Pick one and stick with it, both for internal and external links. - Focus link juice of internal links - Consistent external linking Keywords in URLs - Helpful, but not critical - Can look spammy if you overdo them Limit the Parameters - Longer URLs can be hard to index - Just say no to Session IDs! Use cookies if you need something that critical * Domain Issues Using Keywords - Links are more likely to use keywords in anchor text - Can have negative branding impact Domain Variants are an Opportunity - "Multiple Shelf Space" strategy. Allows you to try to get two domains in the SERPs. - Reputation management. Can use it to push a bad site down in the SERPs. - Must have unique content Navneet Virk is next. SEO Components -- many things. Content, Code, and Technical Architecture/CMS/Backend. These are reponsibile for different things. This session is going to talk more about the URL structure and teh technical architecture of the site. When we say accessibility here, we mean accessibility for search engines. Optimized URL Properties of an optimized URL - accessible by search crawlers - targeted towards the keywords - readable by search engines and users - unique for content (and vice versa) - relevant in terms of domain (if possible) and sub-domains as well as directory nomenclature. Big issue: URLs and content management systems. Challenge: Complex URL structure as compared to non-CMS site. Resolution: Configure the URL components to follow navigational elements (either directly from URL or breadcrumb elements) Challenge: CMS inserts multiple parameters and dynamic components into the URL. Resolution: Setup rules around allowing dynamic and special characters. e.g. replace all special characters, spaces, space encoding with a hyphen (avoid concatenating multiple phrase keywords). Try to embed navigation in URL if possible, put breadcrumbs in URL. CMS can have multiple URLs for same page. This can also make it hard to look at statistics. Every time someone changed title of page (added comma, etc.) CMS made a different URL. Challenge: Content duplication. Channel structure of CMS following navigation breadcrumb can create present the same content on multiple URLs based on user path. Resolution: implement URL redirects to single preferred URL. Challenge: Multiple URLs displaying the same content but differing in the order of parameters or lack of nonessential parameters. Resolution: Implement robots meta tag to allow content managers to exclude content from search engine indices, or use robots.txt. Challenge: Record name approach. Irrelevant parameter names and values not adding any relevant information or value to the URL. Resolution: Implement database field to allow search CMS generated URL parameters can be replaced by user directed page or record name. Leverage record name for the URL. (make URL relevant - name instead of number in URL) Vanity URLs Vanity URL is a domain name, typically created by a company to point to a specific product or advertising campaign microsite or section. Often leveraged by the sites with inherently complex URL structure to provide simpler, memorable and relevant URLs. - used mostly for offline ads (print/TV) - increasingly being used for SEO Vanity URLs, however, also present the risk of content duplication penalties. - exposing the vanity and system version of URL to the search engines can present same content on multiple URLs. - implement permanent redirects from system URLs to the vanity URLs. Leverage vanity URLs to target and serve "Keyword Personas" Recommend to pick vanity URL as the default, since it is easier to remember and more friendly. Tracking parameters in URL Issues - Duplicate Content: SE indexing URL versions with tracking parameter - Inbound Link Dilution: inbound links to tracked version does not help in search engine visibility - Skewed web metrics: If URL with the tracking code is returned for some queries on the search engines and clicked by users, it can lead to skewed reporting. Resolution: Conditional meta tag. On-page scripting parses the URL of the page and adds a noindex meta tag on all version that include extra unwanted parameters. Robots.txt - use wild cards in the robots.txt file to block spiders from crawling pages appended with a designated tracking code. Secure or non-secure. https vs http Search engines do not have any issues with indexing and presenting secure pages in the SERPs but you, however, want to make sure that you eliminate the risk of content dupe due to the potential of same content being served…missed rest of what he said. Absolute vs. relative Links. Factors to consider: - load time. Load time may not affect se rankings, but it does affect user experience and conversions - secure or non-secure sections of the site. Always use absolute links on the secure pages or there are changes that both https and http versions may get indexed. Absolute links can mitigate the risk of content scraping, may be better for RSS Also good for PDF Missed last point URL redirect Challenge Page URLs are unique IDs for the page. URL changes may lead to - loss of ranking and history in SE indices - loss of PR - search engines directing traffic to dead pages - loss of link popularity resolution - create and implement a traffic retention plan - right technical approach for the redirects - testing - log analysis - old domain removals from SE Redirection tips 301 should be used, not 302 or javascript redirects. Implement page or directly level redirects instead of site level. Missed rest of what was said. If running Apache, place “rules” within .htaccess or your Apache config file (e.g. httpd.conf, sites_conf/…)Make a backup first! This example was given, and Stephan explained what each part of the rewrite rule meant. A bit thank you to Stephan for putting the presentation online, there would have been no way for me to write this all down correctly! RewriteEngine on RewriteBase / RewriteRule ^products/([0-9]+)/?$ /get_product.php?id=$1 [L] RewriteRule ^([^/]+)/([^/]+)\.htm$ /webapp/wcs/stores/servlet/ProductDisplay?storeId=10001&catalogId=10001&langId=-1 &categoryID=$1&productID=$2 [QSA,P,L] The magic of regular expressions / pattern matching * means 0 or more of the immediately preceding character [this can be a problem. It might match on nothing. If you want it to match at least one number, then use the + sign) + means 1 or more of the immediately preceding character ? means 0 or 1 occurrence of the immediately preceding char ^ means the beginning of the string, $ means the end of it . means any character (i.e. wildcard) \ “escapes” the character that follows, e.g. \. means dot [this says that you really do mean a period, not that you're using it as a match any character] [ ] is for character ranges, e.g. [A-Za-z]. ^ inside [] brackets means “not”, e.g. [^/] () puts whatever is wrapped within it into memory Access what’s in memory with $1 (what’s in first set of parens), $2 (what’s in second set of parens), and so on Regular expression gotchas to beware of: “Greedy” expressions. Use [^ instead of .* .* can match on nothing. Use .+ instead Unintentional substring matches because ^ or $ wasn’t specified Proxy page using [P] flag RewriteRule /blah\.html$ http://www.google.com/ [P] [QSA] flag is for when you don’t want query string params dropped (like when you want a tracking param preserved) [L] flag saves on server processing Got a huge pile of rewrites? Use RewriteMap and have a lookup table as a text file If you're on Microsoft IIS SErver ISAPI_Rewrite is helpful for those people that have to fight with IIS, not too different from Apache. My condolences though if you have to use IIS. In httpd.ini : [ISAPI_Rewrite] RewriteRule ^/category/([0-9]+)\.htm$ /index.asp?PageAction=VIEWCATS&Category=$1 [L] Will rewrite a URL like http://www.example.com/index.asp?PageAction=VIEWCATS&Category=207 to something like http://www.example.com/category/207.htm 301 Redirects In .htaccess (or httpd.conf), you can redirect individual URLs, the contents of directories, entire domains… Redirect 301 /old_url.htm http://www.example.com/new_url.htm Redirect 301 /old_dir/ http://www.example.com/new_dir/ Redirect 301 / http://www.example.com Pattern matching can be done with RedirectMatch 301 RedirectMatch 301 ^/(.+)/index\.html$ http://www.example.com/$1/ Or use a rewrite rule with the [R=301] flag RewriteCond %{HTTP_HOST} !^www\.example\.com$ [NC] RewriteRule ^(.*)$ http://www.example.com/$1 [L,QSA,R=301] [NC] flag makes the rewrite condition case-insensitive Conditional Redirects Selectively redirect bots that request URLs with session IDs to the URL sans session ID: RewriteCond %{QUERY_STRING} PHPSESSID RewriteCond %{HTTP_USER_AGENT} Googlebot.* [OR] RewriteCond %{HTTP_USER_AGENT} ^msnbot.* [OR] RewriteCond %{HTTP_USER_AGENT} Slurp [OR] RewriteCond %{HTTP_USER_AGENT} Ask\ Jeeves RewriteRule ^/(.*)$ /$1 [R=301,L] Utilize browscap.ini instead of having to keep up with each spider’s name and version changes URLs that lead to error pages - Traditional approach is to serve up a 404, which drops that obsolete or wrong URL out of the search indexes. This squanders the link juice to that page. - But what if you return a 200 status code instead, so that the spiders follow the links? Then include a meta robots noindex so the error page itself doesn’t get indexed. - Or do a 301 redirect to something valuable (e.g. your home page) and dynamically include a small error notice? Next speaker: looking at international issues with Cindy Krum of Blue Moon. International Site Architecture: Are we speaking the same language? Challenges: - multiple languages, currencies measurements, seasonality - different search engines - different e-commerce laws - difference in marketing aesthetics Site Architecture can help address these issues. Site Architecture Options: One site, server side translation, and multiple sites. Things to consider: - Design, development, and maintenance cost - Server configuration and location - Needs to work well with CMS and order fulfillment - Email, direct marketing, affiliates, and PPC - Traditional Advertising - SEO Looking at SEO for all of these architecture options. One Site Approach Using subdomains and subdirectories. If you divide by language, write the name of the language in the language you are targeting. Example: www.Deutsch.yoursite.com Pros: - easy to set up - links and traffic all point to one domain - more pages in the index - flexibility with messaging (seasonality is one example) - grouping by language prevents dupe content - country specific hosting option (subdomains) Cons: - homepage in wrong language can be confusing - home page only ranks in one language, but multiple countries - grouping by country risks duplicate content (grouping by language does not). Example of two countries that speak English. One site for each country, but this results in duplicate content, because both are in English. Tips: - Specify Target country for each site in Google Webmaster Tools - Redirect country specific domains to appropriate subdomain or subdirectory. For example, buy something with .de domain then redirect for right site (German example) - internal and external links issues. You want links from sites with the appropriate language, and using appropriate anchor text. Server Side Translation Option. Server determines location via IP, then shows right page. Pros: - Best user experience - It can work with legacy CMS, meta data, content, feeds - Robots translation software is more scalable Cons - Harder to set up - Natural inbound links could be to section with wrong language - Translation software must be checked, may not always target right version of keyword. More tips - Don't set a location in Google webmaster tools - Redirect country specific domains to appropriate translations - Allow users to change language/location and set language cookie, in case IP sniffing doesn't work - minimize on page javascript, or have everything happen on server - language meta tag, html language Multiple Site Option: Pros - incrementally low startup costs - can add sites one at a time - rank well in multiple country-specific search engines - country specific hosting Cons - more sites - more sites to update (inventory, news, specials, etc.) need to update on all sites - multiple site - multiple seo efforts - harder to rank in .com - forced to target countries instead of languages - no stemming algorithm. Tips: - Target country in Google wematser tools - external links: -- want appropriate language sites with appropriate anchor text -- country specific domains -- local address Blended approach:. Send all traffic to .com domain, people choose where they want to go. Most realistic for worldwide presence. You've already got ranking in .com, then you can build out country specific sites as needed. Buy domains as quickly as you can, even if you don't need them right away. It can be costly to create. Specify countries for individual sites, but not on the .com site. Link your multiple country sites carefully and logically. Don't link all country-specific sites to each other if you don't need to. external links - international links to international site - country specific links to country specific site - let users know you're taking them to another site - use java translation/ip sniffing on homepage

Contributed by Keri Morgret.

 

Popular Categories

The Pulse of the search community

Search Video Recaps

 
- YouTube
Video Details More Videos Subscribe to Videos

Most Recent Articles

Search Forum Recap

Daily Search Forum Recap: December 17, 2024

Dec 17, 2024 - 10:00 am
Google Search Engine Optimization

Google Site Reputation Abuse: Treating Some Sites Within A Site

Dec 17, 2024 - 7:51 am
Google Search Engine Optimization

Google: Disavowing Toxic Links Is A Billable Waste Of Time

Dec 17, 2024 - 7:41 am
Google Search Engine Optimization

Google Adds Faceted Navigation To Help Documentation

Dec 17, 2024 - 7:31 am
Google Ads

New Google Merchant Center Promotion For First Order Discount

Dec 17, 2024 - 7:21 am
Other Search Engines

OpenAI Opens ChatGPT Search To All Logged In Users

Dec 17, 2024 - 7:11 am
Previous Story: Reputation Monitoring and Management Through Search