HTML Logo by World Wide Web Consortium (www.w3.org). Click to learn more about our commitment to accessibility and standards.

Moving forward with Composr

ocPortal has been relaunched as Composr CMS, which is now in beta. ocPortal 9 will be superseded by Composr 10.

Head over to compo.sr for our new site, and to our migration roadmap. Existing ocPortal member accounts have been mirrored.


Google Crawl Bots Out of Control

Login / Search

 [ Join | More ]
 Add topic 
Posted
Rating:
#104528 (In Topic #20413)
Avatar

Honoured member

Ok, this may be an odd one, but I have a situation where two Google crawl bots are camped out on one of our forum's topics. They are just sitting there registering 1,000's of page views. I want accurate stats, and want them to crawl (periodically), but this is getting a little ridiculous...

Anyone else ever had this occur, and if so, how did you get them to resume "normal" behavior?

Back to the top
 
Posted
Rating:
#104529
Avatar

Honoured member

Just tried, but…

Banning the IP address does not work, the session still stays in tact.
Back to the top
 
Posted
Rating:
#104537
Avatar

Hi,

I've just put in a quick patch for you so that view counts won't go up for bots accessing forum topics, and applied to your site.

Hopefully that'll resolve it :).

(Other users will get in v10)


Become a fan of ocPortal on Facebook or add me as a friend. Add me on on Twitter.
Was I helpful?
  • If not, please let us know how we can do better (please try and propose any bigger ideas in such a way that they are fundable and scalable).
  • If so, please let others know about ocPortal whenever you see the opportunity.
  • If my reply is too Vulcan or expressed too much in business-strategy terms, and not particularly personal, I apologise. As a company & project maintainer, time is very limited to me, so usually when I write a reply I try and make it generic advice to all readers. I'm also naturally a joined-up thinker, so I always express my thoughts in combined business and technical terms. I recognise not everyone likes that, don't let my Vulcan-thinking stop you enjoying ocPortal on fun personal projects.
  • If my response can inspire a community tutorial, that's a great way of giving back to the project as a user.
Back to the top
 
Posted
Rating:
#104538
Avatar

Honoured member

Chris Graham said

Hi,

I've just put in a quick patch for you so that view counts won't go up for bots accessing forum topics, and applied to your site.

Hopefully that'll resolve it :).

(Other users will get in v10)

Ok, thanks!

It was up to 217,000 in just over a month so knew something was wrong…lol
Back to the top
 
Posted
Rating:
#104566
Avatar

Honoured member

They're still registering page views…….

 O_o
Back to the top
 
Posted
Rating:
#104568
Avatar

Page views, or topic views? Because I only changed topic views. And, only for defined bots -- do you know the exact user-agent of the bot? It might not be Google, could be some malicious bot, as those tend to be the worst, and would not be recognised by ocPortal as a bot.


Become a fan of ocPortal on Facebook or add me as a friend. Add me on on Twitter.
Was I helpful?
  • If not, please let us know how we can do better (please try and propose any bigger ideas in such a way that they are fundable and scalable).
  • If so, please let others know about ocPortal whenever you see the opportunity.
  • If my reply is too Vulcan or expressed too much in business-strategy terms, and not particularly personal, I apologise. As a company & project maintainer, time is very limited to me, so usually when I write a reply I try and make it generic advice to all readers. I'm also naturally a joined-up thinker, so I always express my thoughts in combined business and technical terms. I recognise not everyone likes that, don't let my Vulcan-thinking stop you enjoying ocPortal on fun personal projects.
  • If my response can inspire a community tutorial, that's a great way of giving back to the project as a user.
Back to the top
 
Posted
Rating:
#104569
Avatar

Honoured member

I looked it up on IP address finder and it is registered to google:

66.249.69.46

IP Address: 66.249.69.46
Hostname: crawl-66-249-69-46.googlebot.com
IP Country: United States
IP Country Code: USA
IP Continent: North America
IP Region: California
IP City: Mountain View
IP Latitude: 37.386
IP Longitude: -122.0838
Organization: Googlebot
ISP Provider: Googlebot

Its camped out on a forum topic, and it just sitting there, has been for weeks…
Back to the top
 
Posted
Rating:
#104576
Avatar

I can certainly see what you're saying in the logs. I am running some tests to see if I can guide Google better, and check to see if my view-count fix worked.


Become a fan of ocPortal on Facebook or add me as a friend. Add me on on Twitter.
Was I helpful?
  • If not, please let us know how we can do better (please try and propose any bigger ideas in such a way that they are fundable and scalable).
  • If so, please let others know about ocPortal whenever you see the opportunity.
  • If my reply is too Vulcan or expressed too much in business-strategy terms, and not particularly personal, I apologise. As a company & project maintainer, time is very limited to me, so usually when I write a reply I try and make it generic advice to all readers. I'm also naturally a joined-up thinker, so I always express my thoughts in combined business and technical terms. I recognise not everyone likes that, don't let my Vulcan-thinking stop you enjoying ocPortal on fun personal projects.
  • If my response can inspire a community tutorial, that's a great way of giving back to the project as a user.
Back to the top
 
Posted
Rating:
#104578
Avatar

Okay, I am tuning a few things here to reduce bot intensity, but the main cause is Googlebot here is masquerading as an iPhone while simultaneously saying it is Googlebot. ocPortal assumes real known browsers aren't bots due to an optimisation, so I'm going to change that slightly.


Become a fan of ocPortal on Facebook or add me as a friend. Add me on on Twitter.
Was I helpful?
  • If not, please let us know how we can do better (please try and propose any bigger ideas in such a way that they are fundable and scalable).
  • If so, please let others know about ocPortal whenever you see the opportunity.
  • If my reply is too Vulcan or expressed too much in business-strategy terms, and not particularly personal, I apologise. As a company & project maintainer, time is very limited to me, so usually when I write a reply I try and make it generic advice to all readers. I'm also naturally a joined-up thinker, so I always express my thoughts in combined business and technical terms. I recognise not everyone likes that, don't let my Vulcan-thinking stop you enjoying ocPortal on fun personal projects.
  • If my response can inspire a community tutorial, that's a great way of giving back to the project as a user.
Back to the top
 
Posted
Rating:
#104579
Avatar

Ok done.

3 changes in total:
  1. Bots now detected even if they simultaneously claim to be normal browsers
  2. The links to RSS/Atom now no longer include the session ID if a bot is crawling, to ensure the bot doesn't think each different session ID it ever has represents a different RSS feed (probably was not an issue, but being cautious)
  3. The 'jump to post' links in topics now have a nofollow, so Google doesn't try and individually index them (there's no unique content on them).


Become a fan of ocPortal on Facebook or add me as a friend. Add me on on Twitter.
Was I helpful?
  • If not, please let us know how we can do better (please try and propose any bigger ideas in such a way that they are fundable and scalable).
  • If so, please let others know about ocPortal whenever you see the opportunity.
  • If my reply is too Vulcan or expressed too much in business-strategy terms, and not particularly personal, I apologise. As a company & project maintainer, time is very limited to me, so usually when I write a reply I try and make it generic advice to all readers. I'm also naturally a joined-up thinker, so I always express my thoughts in combined business and technical terms. I recognise not everyone likes that, don't let my Vulcan-thinking stop you enjoying ocPortal on fun personal projects.
  • If my response can inspire a community tutorial, that's a great way of giving back to the project as a user.
Back to the top
 
Posted
Rating:
#104623
Avatar

Honoured member

Ahh…

Thanks Chris!

Isn't the practice they are using with these  bots a little shady?

Acting like you are something that you are not??

Back to the top
 
Posted
Rating:
#104624
Avatar

Not shady, just something they didn't used to do. They're trying to identify more about your site to better know what to index, better index, and better link through to from results.


Become a fan of ocPortal on Facebook or add me as a friend. Add me on on Twitter.
Was I helpful?
  • If not, please let us know how we can do better (please try and propose any bigger ideas in such a way that they are fundable and scalable).
  • If so, please let others know about ocPortal whenever you see the opportunity.
  • If my reply is too Vulcan or expressed too much in business-strategy terms, and not particularly personal, I apologise. As a company & project maintainer, time is very limited to me, so usually when I write a reply I try and make it generic advice to all readers. I'm also naturally a joined-up thinker, so I always express my thoughts in combined business and technical terms. I recognise not everyone likes that, don't let my Vulcan-thinking stop you enjoying ocPortal on fun personal projects.
  • If my response can inspire a community tutorial, that's a great way of giving back to the project as a user.
Back to the top
 
Posted
Rating:
#104625
Avatar

... and they do that by simulating something closer to reality.


Become a fan of ocPortal on Facebook or add me as a friend. Add me on on Twitter.
Was I helpful?
  • If not, please let us know how we can do better (please try and propose any bigger ideas in such a way that they are fundable and scalable).
  • If so, please let others know about ocPortal whenever you see the opportunity.
  • If my reply is too Vulcan or expressed too much in business-strategy terms, and not particularly personal, I apologise. As a company & project maintainer, time is very limited to me, so usually when I write a reply I try and make it generic advice to all readers. I'm also naturally a joined-up thinker, so I always express my thoughts in combined business and technical terms. I recognise not everyone likes that, don't let my Vulcan-thinking stop you enjoying ocPortal on fun personal projects.
  • If my response can inspire a community tutorial, that's a great way of giving back to the project as a user.
Back to the top
 
Posted
Rating:
#104626
Avatar

Honoured member

Oh, ok will shut up now.

 :shutup:
Back to the top
 
There are too many online users to list.
Control functions:

Quick reply   Contract

Your name:
Your message: