A WebmasterWorld thread asks why does the site command in Google not match up in the number of "indexed" URLs reported in Google Webmaster Tools. A very valid question, let me show you.
A simple site command in Google for site:www.seroundtable.com returns 17,500 results. So that means, Google has indexed approximately 17,500 pages from the www of this domain.
Now, if I login and check my Sitemap data for this site (yea, I finally created a Sitemap file), it shows about half of the indexed URLs. It says Google has indexed 8,813 URLs of the 9,086 I submitted.
For me, the answer is simple. I seem to only sending URLs of the individual blog posts here. So although I have about 9,000+ blog posts at this domain, I still have about twice as many pages on this site, due to the categories, date archives, tag landing pages and so on. Those pages are not included in my Sitemap file. So Google seems to only showing the indexed URLs of what I submitted. Of course, it is hard for me to validate that by just looking at the numbers.
What I found interesting is when I went to Yahoo's Site Explorer, Yahoo told me they h have indexed 16,498 of my pages, but crawled only 15,022 pages and thus know about 16,498 of my pages. I guess via linkage data, they can index more of my pages then they actually crawl?
In fact, Yahoo's numbers for a inurl:seroundtable.com command is almost on target to the numbers they report in Site Explorer, which is nice.
In regards to what is going on with Google... I am not sure if the results are accurate or not. Tedster at WebmasterWorld said:
I'm never surpised when Webmaster Tools information seems peculiar in some way - it happens a lot. Also note that site:example.com results are getting weirder and weirder, often omitting urls that definitely are in the index - sometimes with a simple site:example.com/directory/ query.
Forum discussion at WebmasterWorld.