Fun with Dynamic Websites is a session presented by Mikkel deMib Svendsen, Laura Thieme, and Jake Baillie.
Mikkel deMib Svendsen (demib.com) presents and says that search engines like to index as much valuable content as they can possibly find. If search engines don't index you, it's not because they don't like you. Dynamic websites can be difficult to be indexed.
It comes down to IRTA: Indexing - getting your pages indexed Ranking - ranking for relevant keywords Traffic - get people to click on your website Actions - get users to do desired actions
Indexing is where the problems are. Typically, there are a bunch of problems being crawled. Ranking: options - dynamic websites can potentially outrank static websites Traffic - the game is the same - static or dyamic Actions - technology only plays a limited role as long as it works.
Dynamic architecture - a user requests a page and the webserver might query a database, server side includes, or other variables. The problem with this picture is that the complexity of what goes on at the backend level is usually just returned directly to the users and the spiders - that is not always good. If your engineer comes back and offers Javascript, requires parameters, or needs cookies, this can cause some problems. How do you improve this issue? You can simplify technology. If this is not possible (buying off-the-shelf CMS systems and untweakable systems), you might want to think of a virtual layer between the back architecture and the front architecture, like a bridge. One of the most common ways to do this is requiring a URL rewrite engine. The complexity of the URLs can be in the backend but the users/spiders don't have to see it. Another option is to use a static replication of the content.
What is not a problem? It is not a problem to store content in the database. Search engines need a safe way to navigate to the content. Search engines won't fill out forms or query the database. Also, a ? mark is not a problem. It's just an easy way to identify a template-based dynamic web page. It shows that there's one file that serves different content depending on parameters passed to it. SSIs (server-side includes) are also not a problem. The thing that is important is what is returned to the users/spiders.
Extension names are also not a problem. Use .asp, .jsp, .cfm, .html, whatever you want.
Search engines don't care what processes run on your web server as long as what is returned is valid JHTML architecture.
What are some indexing barriers? One of the most common issues is long and ugly URLs. This URL will not be indexed in any engine. Another common issue is duplicate content: session IDs, click IDs, time-stamped URLs. The engines do not want to index the same content on different URLs. You don't want many-to-one problems. He shows an example of over 200,000 visits on Yahoo and says that sooner or later, this will hurt you. Server downtime and slow response times can be issues. If you don't know how fast that Google can spider your site, sign up to Google Webmaster Tools. Spider traps - infinite loops of dynamically created links and pages
Other indirectly related issues (You can find these issues on static sites as well) include: required support of cookies, Javascript, Flash, etc.; geotargeting and personalization, form (post method) based navigation
Issues not related at all: robots.txt and META-robots exclusion, frames/password protected sites.
Solutions that work: There are many solutions available. There's always more than one way to solve a problem. Don't just pick the first and best. Get an overview of the ways to deal with a dynamic website and pick the one that works for you. Work from the bottom up. - Fix your system - or: - Add a "bridge layer" or: - Replicate your content
Favorite fix: the one-parameter website. Change something from index.php?id=12&cart=23&sort_order=44 and store this in a database to one parameter to call something like this: index.php?R=35
Identify spiders on a global level. Don't serve session IDs to spiders. This has nothing to do with cloaking so don't be afraid to use this techniques. Static pages may not be as bad. Use dynamic objects on hard-coded static pages (examples are banner scripts, timestamps from server, rotating news flash, RSS feeds, etc.) Create a sitemap: guide search engines to the most important part of your site, etc.
You can also pay to play: pay for inclusion - directories, Yahoo! search, etc.
A dynamic website can be more optimized than static site!
Next up is Laura Thieme (bizresearch.com). She mentions that she's been in the business for 10 years and focuses on key topics: are you indexed? The crucial pages should be indexed. Other important things include optimization tactics - how long does it take? etc. Extenrnal factors (CMS, etc), and ensuring that you don't lose rankings.
The first thing is to begin your research project. Look at the URL structure. How many variables are in your URLs? Look at the search engine indices (doable for Google/Yahoo, but what about MSN whose site:xxx.com is broken?), curren ranking, spider activity (NetTracker, ClickTracks Pro, Log Analyzer), determining target terms. Once you are here, however, you have to overcome technology, resources, and/or political challenges. Index, optimize, and monitor improvements.
With dynamic versus static pages: in most cases, you do not have to keyword embed, you do not have to create static only pages. Be prepared that by doing these changes, you can lose rankings.
Home page titles can really matter. Example: Pier1.com - we had a few select phrases that we wanted to focus on. Target term: Dining Room Tables on pier1.com. Google said it did not want this URL but MSN took it. Even though it was relevant, it was not indexed. Therefore, it was in the homepage title. Ironically, within 6 months, the title "Dining Room Tables" showed up in the Google SERPs as #3. Category page titles matter too. There are other optimization opportunities, because you can add target terms to the title. Interestingly, people search for "candle holders" instead of "candleholders." Make sure that your tools can overwrite the subcategory titles. Titles, headings, navigational text, metadata, anchor text, links - incoming and outgoing.
Example: champion.com. The page was indexed but wasn't optimized. Sure enough, they are ranked in the top 10 with a title update for Women's Fitted Tees.
Example: Pear's Gourmet. Revised page title bought better rankings.
Example: Levenger ballpoint pens. Each product title added "Ballpoint Pens" to the product and that increased its ranking.
Example: Gabriel Logan - replaced images with text, optimized page title.
External factors: so what if you've done this optimization? If you can't get the search engines to read your URLs and your CMS (content management system) is keeping you from getting indexed, what can you do? Watch out for these vendors that prevent spidering, fail to properly redirect, and lack administrative tools for you to focus on optimization. Try talking to a CMS customer service rep. Research before you buy.
Search engines may choke on some dynamic URLs generated by your site search engine. You may just need to upgrade your CMS.
You might see that MSN might be picking you up faster than Google. Select keyword phrase improvements in Google, but takes longer to achieve top positions. Consider optimizing a Yahoo data feed.
Which one wins? Dynamic or static? There are many times that we work with clients who work with agencies that are determined to create static sites. On a scalable model, static pages are harder to keep up to date. Other technology challengeS: site search engine, check the version and way it's getting indexed; check your robots.txt; canonical issues, 404s
Keyword embedding URLs work - but don't forget to put 301s in place, but be willing to temporarily lose rank. If you redesign your pages, focus on 301s.
In summary: get indexed, optimize based on the way people search, submit a data feed, monitor improvement, be persistent/patient. If you see that you're still not ranked, you might have a duplicate content penalty or over-optimization.
The final speaker is Jake Baillie, managing director of STN labs.
He focuses on duplicate content - why it happens on websites and how you can control it.
Dr. Phil's duplicate content primer: "To take control of your problems, you have to understand why they happen." The biggest cause of duplicate content on dynamic websites is circular navigation. Brand/category/item, category/brand/item, etc. When you build, for example, e-commerce sites, you have multiple ways to get to the same result. You need to be consistent on how to access certain content. If you have a page on red widgets, make sure the URL to that is always the same.
Print-friendly pages, by definition, are duplicate pieces of content. Use CSS to style and JavaScript to flip between them. In a pinch, block the other content from search engines.
Designers have this mental block - to slash or not to slash, that is the question. Pick a directory index format, and be consistent. This causes 60% of duplicate content issues.
Looks don't count - you need content on your pages. Lack of content causes stoppage of indexing. Stoppage of indexing causes lack of rankings. E-commerce sites are a problem.
You might have issues with your registrar's DNS redirection service - 301 redirects are the best way to do it. Different URLs - same page. DOn't use cloaking scripts you don't understand.
Contnet : URL - always one-to-one relationship. The golden rule to avoid duplicate content issues.
Enjoyable image service - if you are selling a product and someone links to the picture on eBay, you can use mod_rewrite to serve them a different image. You can also do a price swap RegEx example. What Jake is trying to say is that with modern webservers and dynamic sites, if you can think of a condition to test for, you can act upon this condition: competitors, time of day, type of browser, length of visit, number of visitors, your mood, etc. If someone requests 200 pages in 30 seconds, it's probably not the kind of visitor you want - but you would want exceptions to be made for spiders.
A few years ago, Jake focused on shortening URLs for search engines, but there are a lot more ways to focus on mod_rewrite to be creative with your results.
Other ideas for harassing your competitors: 403 - access forbidden, 404 page not found, different pages, sounds/lights/pictures/movies, annoying JavaScript, their own website, track. This is good for testing too.