In October, a WebmasterWorld member has been monitoring his website and has noticed that Google is beginning to index pages that seem unlinked from anywhere. He suspects that either Google is using Toolbar data, someone is linking to these pages deliberately, or he may have caused the pages to be linked somehow (but he's pretty sure that's not the case).
Others notice similar "creative" spidering, and as Tedster puts it, "googlebot [is] trying to eat almost anything that might be edible in even the least way."
After investigating further, this is what appears to be happening, according to the member who discovered the issue:
- Googlebot is spidering GET forms by getting the form variables and either leaving them blank or assigning values to them (sometimes taken from options in the form itself) - Google has a list of words present on the site - This list of words is being used to populate the form variables, and the URL requested via GET
It seems that it's now February and the same member found similar strange activity within Google. The site has over 519,000 spider requests from Google since the end of October. He believes that Googlebot is adding the GET data by itself, either by accident or to discover new content. What do you think it is?
Forum discussion continues at WebmasterWorld.