Here is one more oddity to add to Microsoft Bing's web crawler, MSNBot. Why on earth are people reporting that MSNBot is crawling the same page twice, once for the compressed version and then once again for the uncompressed version? Technically, it should probably only crawl once and it should opt for the compressed, gzip version - don't you think?
We have two threads complaining about this, one oldish one at WebmasterWorld and another at Bing Forums. Let me quote the Bing thread:
I've notice that bing is crawling each page of my website twice, first making an HTTP 1.1 request and getting a compressed response then immediately issuing an HTTP 1.0 request to receive the same page without gzip compression The following lines from my log show the issue (there are thousands more similar occurrences): 65.55.207.74 - - [13/Dec/2009:14:58:42 +0000] "GET /specimen/235698/ HTTP/1.1" 200 1742 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" 65.55.207.74 - - [13/Dec/2009:14:59:06 +0000] "GET /specimen/235698/ HTTP/1.0" 200 4259 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" 65.55.106.209 - - [13/Dec/2009:15:03:08 +0000] "GET /specimen/250262/ HTTP/1.1" 200 1733 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" 65.55.106.209 - - [13/Dec/2009:15:03:14 +0000] "GET /specimen/250262/ HTTP/1.0" 200 4164 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" This seems a waste of bandwidth and completely defeats the point of supporting http compression.
Indeed a waste of bandwidth and yes, it defeats the point of supporting HTTP compression.
A Bing representative, Brett Yount said:
could you please mail this information to [email protected] and I will get our crawling team to check it out?
But we have no confirmation from Bing on why this issue is occurring or when it will be fixed. Like I said, just one more oddity to add to MSNBot's crawl behavior.
Forum discussion at WebmasterWorld and Bing Forums.