It seems that a lot of well-known companies have webmasters (or legal departments) who just don't have a clue how to implement a robots.txt file. According to a DigitalPoint Forums thread, the United Kingdom based Daily Telegraph is looking to sue Google and Yahoo for accessing its content.
Their statement, as quoted in the Guardian Unlimited, is that they are concerned that these search engines are accessing content for free and don't give them proper credit.
Our ability to protect content is under consistent attack from those such as Google and Yahoo who wish to access it for free. These companies are seeking to build a business model on the back of our own investment without recognition. All media companies need to be on guard for this. Success in the digital age, as we have seen in our own company, is going to require massive investment... [this needs] effective legal protection for our content, in such a way that allows us to invest for the future.
Apparently, they're clueless about implementing a robots.txt file that will prevent search engines from accessing content "for free." As of this writing, this is its current robots.txt file:
# Robots.txt file # All robots will spider the domainUser-agent: *
Disallow: */ixale/
Not only that, but they have the ability to remove content from the SERPs in Google and in Yahoo.
It is a bit disturbing how many people are concerned about search engines (which ultimately give them more visibility!) The claim that search engines don't respect their rules goes both ways. Daily Telegraph, I imagine you have rules you want Google and Yahoo to respect. Well, the search engines have rules too. Follow them and you'll be fine.
Feel free to add your two cents on the DigitalPoint Forums thread.