HTML Logo by World Wide Web Consortium (www.w3.org). Click to learn more about our commitment to accessibility and standards.

Moving forward with Composr

ocPortal has been relaunched as Composr CMS, which is now in beta. ocPortal 9 will be superseded by Composr 10.

Head over to compo.sr for our new site, and to our migration roadmap. Existing ocPortal member accounts have been mirrored.


HTTP/1.1 401 Unauthorized errors in Google Webmaster Tools

Login / Search

 [ Join | More ]
 Add topic 
Posted
Rating:
#99666 (In Topic #19584)
Avatar

Well-settled

I've got increasing HTTP/1.1 401 Unauthorized errors in Google Webmaster Tools for a ocPortal v9.0.8 powered site. The errors started to appear a month ago after upgrading to v9.0.8. Nothing else was changed on the site and its content can be accessed in browsers with guest account without any problems. But the "Fetch as Google" feature in Webmaster Tools provides the following error:

"Googlebot couldn't crawl your URL because your server either requires login to access the page, or is blocking Googlebot from accessing your site."

with the header:

"HTTP/1.1 401 Unauthorized
Date: Sat, 17 Aug 2013 20:03:25 GMT
Server: Apache
Expires: Mon, 20 Dec 1998 01:00:00 GMT
Pragma: no-cache
X-Powered-By: ocPortal 9.0.8 (PHP 5.2.17)
Set-Cookie: has_cookies=1; expires=Sun, 15-Dec-2013 20:03:25 GMT; path=/
Set-Cookie: ocp_session=884932504; path=/
Last-Modified: Sat, 17 Aug 2013 20:03:25 GMT
Keep-Alive: timeout=1, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked"

Content-Type: text/html; charset=utf-8

Does anybody know what could be the reason?

FeminaPortal - Female Internet Portal (powered by ocPortal)
INFORBIRO - Information Technology Agency
BlicKlik - Internet Marketing and Advertising
Back to the top
 
Posted
Rating:
#99667
Avatar

Do you have a Google IP blocked?


Become a fan of ocPortal on Facebook or add me as a friend. Add me on on Twitter.
Was I helpful?
  • If not, please let us know how we can do better (please try and propose any bigger ideas in such a way that they are fundable and scalable).
  • If so, please let others know about ocPortal whenever you see the opportunity.
  • If my reply is too Vulcan or expressed too much in business-strategy terms, and not particularly personal, I apologise. As a company & project maintainer, time is very limited to me, so usually when I write a reply I try and make it generic advice to all readers. I'm also naturally a joined-up thinker, so I always express my thoughts in combined business and technical terms. I recognise not everyone likes that, don't let my Vulcan-thinking stop you enjoying ocPortal on fun personal projects.
  • If my response can inspire a community tutorial, that's a great way of giving back to the project as a user.
Back to the top
 
Posted
Rating:
#99668
Avatar

Well-settled

Google IP addresses are not blocked (as I know). Here are IP addresses from the .htaccess file:

order allow,deny
# IP bans go here (leave this comment here! If this file is writeable, ocPortal will write in IP bans below, in sync with it's own DB-based banning - this makes DOS/hack attack prevention stronger)
# deny from xxx.xx.x.x (leave this comment here!)
deny from 198.143.130.90
deny from 173.192.34.95
deny from 173.193.219.168
deny from 173.193.219.168
deny from 173.193.219.168
deny from 173.193.219.168
deny from 173.193.219.168
deny from 72.252.162.208
deny from 72.252.162.208
deny from 103.31.186.82
deny from 213.129.110.144
deny from 37.58.35.118
deny from 213.129.113.94
deny from 213.129.104.37
deny from 79.140.111.143
deny from 37.58.39.118
deny from 37.58.34.92
deny from 37.58.32.10
deny from 37.58.32.127
deny from 37.58.39.93
deny from 213.129.110.49
deny from 37.58.37.52
deny from 79.140.111.103
deny from 37.58.35.121
deny from 79.140.111.40
deny from 213.129.109.231
deny from 37.58.32.67
deny from 37.58.36.93
allow from all

and the following are from /adminzone/pg/admin_ipban/misc

37.58.32.67 Gave incorrect login details 30 times over a 15 minute period (brute-force attack)
37.58.36.93 Gave incorrect login details 30 times over a 15 minute period (brute-force attack)
213.129.109.231 Gave incorrect login details 30 times over a 15 minute period (brute-force attack)
79.140.111.40 Gave incorrect login details 30 times over a 15 minute period (brute-force attack)
37.58.35.121 Gave incorrect login details 30 times over a 15 minute period (brute-force attack)
79.140.111.103 Gave incorrect login details 30 times over a 15 minute period (brute-force attack)
37.58.37.52 Gave incorrect login details 30 times over a 15 minute period (brute-force attack)
213.129.110.49 Gave incorrect login details 30 times over a 15 minute period (brute-force attack)
37.58.39.93 Gave incorrect login details 30 times over a 15 minute period (brute-force attack)
37.58.32.127 Gave incorrect login details 30 times over a 15 minute period (brute-force attack)
37.58.32.10 Gave incorrect login details 30 times over a 15 minute period (brute-force attack)
37.58.34.92 Gave incorrect login details 30 times over a 15 minute period (brute-force attack)
37.58.39.118 Gave incorrect login details 30 times over a 15 minute period (brute-force attack)
79.140.111.143 Gave incorrect login details 30 times over a 15 minute period (brute-force attack)
213.129.104.37 Gave incorrect login details 30 times over a 15 minute period (brute-force attack)
213.129.113.94 Gave incorrect login details 30 times over a 15 minute period (brute-force attack)
37.58.35.118 Gave incorrect login details 30 times over a 15 minute period (brute-force attack)
213.129.110.144 Gave incorrect login details 30 times over a 15 minute period (brute-force attack)
103.31.186.82 A suspicious GET parameter was given (type as misc" and (9=9 xor 5=13)– a)
72.252.162.208 Tried to get something to eval() which was probably malicious
173.193.219.168 A suspicious GET parameter was given (page as >Guide<)
173.192.34.95 A suspicious GET parameter was given (type as >Front page<)
198.143.130.90 A suspicious GET parameter was given


FeminaPortal - Female Internet Portal (powered by ocPortal)
INFORBIRO - Information Technology Agency
BlicKlik - Internet Marketing and Advertising
Back to the top
 
Posted
Rating:
#99670
Avatar

Can you tell me the site URL?


Become a fan of ocPortal on Facebook or add me as a friend. Add me on on Twitter.
Was I helpful?
  • If not, please let us know how we can do better (please try and propose any bigger ideas in such a way that they are fundable and scalable).
  • If so, please let others know about ocPortal whenever you see the opportunity.
  • If my reply is too Vulcan or expressed too much in business-strategy terms, and not particularly personal, I apologise. As a company & project maintainer, time is very limited to me, so usually when I write a reply I try and make it generic advice to all readers. I'm also naturally a joined-up thinker, so I always express my thoughts in combined business and technical terms. I recognise not everyone likes that, don't let my Vulcan-thinking stop you enjoying ocPortal on fun personal projects.
  • If my response can inspire a community tutorial, that's a great way of giving back to the project as a user.
Back to the top
 
Posted
Rating:
#99671
Avatar

Well-settled

For example, the following URL can't be accessed via Fetch as Google tool: http://www.feminaportal.com/forum/pg/forumview/misc/marriage-weddings/index.php?start=180.

However, the index page of the forum can be fetched, i.e. http://www.feminaportal.com/forum/pg/forumview/misc/marriage-weddings/index.php

FeminaPortal - Female Internet Portal (powered by ocPortal)
INFORBIRO - Information Technology Agency
BlicKlik - Internet Marketing and Advertising
Back to the top
 
Posted
Rating:
#99672
Avatar

Ah, got it.

I thought it was a problem hitting the front page.

For deep pagination, we have actually intentionally limited Google from getting there. This is because MySQL has major performance problems returning from deep within result sets, and spiders can really dig very deep, very often and persistently.

Perhaps we can put a nofollow on those links too to stop Google even trying.


Become a fan of ocPortal on Facebook or add me as a friend. Add me on on Twitter.
Was I helpful?
  • If not, please let us know how we can do better (please try and propose any bigger ideas in such a way that they are fundable and scalable).
  • If so, please let others know about ocPortal whenever you see the opportunity.
  • If my reply is too Vulcan or expressed too much in business-strategy terms, and not particularly personal, I apologise. As a company & project maintainer, time is very limited to me, so usually when I write a reply I try and make it generic advice to all readers. I'm also naturally a joined-up thinker, so I always express my thoughts in combined business and technical terms. I recognise not everyone likes that, don't let my Vulcan-thinking stop you enjoying ocPortal on fun personal projects.
  • If my response can inspire a community tutorial, that's a great way of giving back to the project as a user.
Back to the top
 
Posted
Rating:
#99673
Avatar

Well-settled

Hm… the question is, how will bots know that content exist if they can't access pages with links to it?

FeminaPortal - Female Internet Portal (powered by ocPortal)
INFORBIRO - Information Technology Agency
BlicKlik - Internet Marketing and Advertising
Back to the top
 
Posted
Rating:
#99674
Avatar

They will have already indexed it in the past, or found it on the XML site-map, or found other links online to it. This only affects very old screens deep in the pagination, not things linked from them (i.e. the old topics themselves are still accessible).


Become a fan of ocPortal on Facebook or add me as a friend. Add me on on Twitter.
Was I helpful?
  • If not, please let us know how we can do better (please try and propose any bigger ideas in such a way that they are fundable and scalable).
  • If so, please let others know about ocPortal whenever you see the opportunity.
  • If my reply is too Vulcan or expressed too much in business-strategy terms, and not particularly personal, I apologise. As a company & project maintainer, time is very limited to me, so usually when I write a reply I try and make it generic advice to all readers. I'm also naturally a joined-up thinker, so I always express my thoughts in combined business and technical terms. I recognise not everyone likes that, don't let my Vulcan-thinking stop you enjoying ocPortal on fun personal projects.
  • If my response can inspire a community tutorial, that's a great way of giving back to the project as a user.
Back to the top
 
Posted
Rating:
#99675
Avatar

Well-settled

It makes sense. One more question, what do we actually consider as "deep pagination" in ocPortal, e.g. does it represent more than 10, 20 pages or it has some other meaning? It's not the same if a user chooses to view 30 and 300 topics per a page.

FeminaPortal - Female Internet Portal (powered by ocPortal)
INFORBIRO - Information Technology Agency
BlicKlik - Internet Marketing and Advertising
Back to the top
 
Posted
Rating:
#99676
Avatar

It's more than 5 pages deep. That would be the default number per page, as Google isn't going to be changing those settings. I'm sure we could put it much higher, like 30, but I doubt there is an issue of Google flushing out the URLs it has already learned, especially as the sitemap has them too, so I'd rather we verge on the side of performance.

For anyone's interest, I'll explain the issue…

MySQL cannot jump into a result set, performance-wise. So if you get a result set of topics, and want to go say 300 into it, it has to actually retrieve the first 300 records and skip past each. A naive programmer won't know it does this, but if you do performance analysis, you'll see it does. While it does this, it creates a read lock, which stalls any a request for any ocPortal topic trying to update the read count (because MyISAM has table-level locks, not row-level locks). So the performance implication is pretty serious if bots are allowed to keep digging too deep.

In simple terms: when a deep pagination is called, it stalls viewing of any topic, maybe for a second, maybe for a few, depending how deep it is.
It's worse than it sounds though, as this kind of 'spike' can really knock a server out of equilibrium, queuing partially-completed requests in RAM, it might not even recover if you're unlucky or have an aggressive bot.
Hence our recent conservatism :).


Become a fan of ocPortal on Facebook or add me as a friend. Add me on on Twitter.
Was I helpful?
  • If not, please let us know how we can do better (please try and propose any bigger ideas in such a way that they are fundable and scalable).
  • If so, please let others know about ocPortal whenever you see the opportunity.
  • If my reply is too Vulcan or expressed too much in business-strategy terms, and not particularly personal, I apologise. As a company & project maintainer, time is very limited to me, so usually when I write a reply I try and make it generic advice to all readers. I'm also naturally a joined-up thinker, so I always express my thoughts in combined business and technical terms. I recognise not everyone likes that, don't let my Vulcan-thinking stop you enjoying ocPortal on fun personal projects.
  • If my response can inspire a community tutorial, that's a great way of giving back to the project as a user.
Back to the top
 
There are too many online users to list.
Control functions:

Quick reply   Contract

Your name:
Your message: