Meet the Crawlers

Apr 26, 2006 - 1:54 pm 1 by
Filed Under SES Toronto 2006

Moderator: Chris Sherman, Executive Editor, SearchEngineWatch.com

Google Sitemaps Launches New Features Shiva Shivakumar from Google to talk about Google Sitemaps new launch. A live demo of site maps new features. He logged into his account and showed the "my sites" section. The pages look new, there is a "diagnostic" tab that shows summary data, including indexing summary, potential indexing problems, and so on. The tab on the left, gives you more detailed information, it looks like they moved those links from a sub tab at the top to the left hand side. Google now shows you that "no pages from your site is in the Google index" for constitution.org. He goes to the site and shows at the bottom of the page, hidden text - and that is the reason Google shows the message "no pages from your site is in the Google index." This is pretty big stuff. He then moved back to an other site, showing "statistics" main tab and showed "query stats," "crawl stats", "page analysis," and "index stats." He then clicks on the "sitemaps" main tab, and pulls up google.com/sitemap.xml to show the XML document. He then clicked on "robots.txt analysis" under the "tools" section on the left hand side. It allows you to see if you will be crawled or not.

Stephen Evans from MSN Canada. New products; windows desktop search, refreshed user interface, MSN local search beta, windows live search beta, crawling images and news and more. As much as possible MSN Search will attempt to crawl and index pages that help the user find what they are looking for. Basics; build a site map, use robots.txt, be conscious of URL length, query parameters, session variables, beware of text in images, unique content, links to your site or submit your URL, nothing can replace high quality content. Also use descriptive titles, redirects (HTML redirects are best, 301 or 302 are hard), JavaScript, page weight (150KB) and canonical domain. Things to avoid; keyword stuffing, duplicate copies, cloaked content, hidden text and link farms.

Andy Renieris from Yahoo! Canada Search goes over the vision, find, use, share and expand... How to get into the index, link new URLs from existing page in index, make sure all URLs have an inbound link, good authoritative links, don't make site depth too extreme, or use free add URL. Index friendly pages are unique content and avoid spam... French sites, use french meta tags and meta descriptions. He puts up the classic "how yahoo handles redirects" slide. URL rewriting is important, parameters often changed to pseudo-paths, remove session ideas, limit the depth of the URL. He showed the yahoo crawlers, web, shop, audio, news, etc... Recent Yahoo Additions; Site Explorer (not so new); rss and atom feed submission support, ping interface via API, added internal link filter and more things coming to Site Explorer soon. They also have My Web 2.0, the save to my web button... They just did an index update on April 21st.

Kaushal Kurapati from Ask.com who goes over the stats... #6 US web property, 28.5% reach, 48.8 million domestic unique users, 5.9% share of US searches and a division of IAC. Crawler Goals; follow robots.txt standards, politeness (crawl delay, noarchive, noindex, nofollow), efficiency (compressions, avoid duplicates), freshness and multiple file types (html, pdf, flash, ms-office, etc.). (Barry notes; when did they add "nofollow?) Date-stamp content, it helps, so put a "last modified" stamp on your pages. Simplify site-organization and navigation, ensure crawlers can reach all parts of site, use site maps. Watch out for infinite pages, calendars (year 3001) and session IDs. Crawler challaneges; javascript, dynamic pages, image with urls.

 

Popular Categories

The Pulse of the search community

Follow

Search Video Recaps

 
- YouTube
Video Details More Videos Subscribe to Videos

Most Recent Articles

Search Forum Recap

Daily Search Forum Recap: November 22, 2024

Nov 22, 2024 - 10:00 am
Search Video Recaps

Search News Buzz Video Recap: Google Core Update Heated, Site Reputation Abuse Expands, Site Wide Search Signals, DOJ On Chrome, AI Overview Ads & More

Nov 22, 2024 - 8:01 am
Google Search Engine Optimization

Google Search Console Indexing Reports Lagging By 7 Days

Nov 22, 2024 - 7:55 am
Google

Google Things To Know Tests Side By Side Results

Nov 22, 2024 - 7:51 am
Google Search Engine Optimization

Google On Too Many Network Requests & SEO

Nov 22, 2024 - 7:41 am
Google

Google's People Also Search In Images

Nov 22, 2024 - 7:31 am
Previous Story: Targeting Search Ads By Demographics & Behavior