HTML Logo by World Wide Web Consortium (www.w3.org). Click to learn more about our commitment to accessibility and standards.

Moving forward with Composr

ocPortal has been relaunched as Composr CMS, which is now in beta. ocPortal 9 will be superseded by Composr 10.

Head over to compo.sr for our new site, and to our migration roadmap. Existing ocPortal member accounts have been mirrored.


Which spiders/bots do you disallow in robot.txt?

Login / Search

 [ Join | More ]
 Add topic 
Posted
Rating:
#77290 (In Topic #15904)
Avatar

Community saint

Just curious to see what spiders others are disallowing.

I recently denied access to the former Soviet Bloc countries and China in .htaccess and also disallowed Yandex and Baiduspider in robots.txt. Yandex was well-behaved but the the Baiduspider chose to keep searching my site until I blocked it in ,htaccess. I'm still getting hit by quite a few bots with user agents like "bot*" and "spider" and am wondering what other disallows others are using.

I suspect I am being scraped by a bot with no user agent. I understand that this is a tricky situation as MSN/Bing use "undercover" bots to check for SEO trickery so an outright ban on anything without a user agent might be unwise.

What have you included in your robots.txt and which bots are ignoring the robots.txt file?

Bob
Back to the top
 
1 guests and 0 members have just viewed this: None
Control functions:

Quick reply   Contract

Your name:
Your message: