HTML Logo by World Wide Web Consortium ( Click to learn more about our commitment to accessibility and standards.

Moving forward with Composr

ocPortal has been relaunched as Composr CMS. ocPortal 9 is superseded by Composr 10.

Head over to for our new site, and to our migration roadmap. Existing ocPortal member accounts have been mirrored.

Which spiders/bots do you disallow in robot.txt?

Login / Search

 [ Join | More ]
 Add topic 
#77290 (In Topic #15904)

Community saint

Just curious to see what spiders others are disallowing.

I recently denied access to the former Soviet Bloc countries and China in .htaccess and also disallowed Yandex and Baiduspider in robots.txt. Yandex was well-behaved but the the Baiduspider chose to keep searching my site until I blocked it in ,htaccess. I'm still getting hit by quite a few bots with user agents like "bot*" and "spider" and am wondering what other disallows others are using.

I suspect I am being scraped by a bot with no user agent. I understand that this is a tricky situation as MSN/Bing use "undercover" bots to check for SEO trickery so an outright ban on anything without a user agent might be unwise.

What have you included in your robots.txt and which bots are ignoring the robots.txt file?

Back to the top
1 guests and 0 members have just viewed this: None
Control functions:

Quick reply   Contract

Your name:
Your message: