HTML Logo by World Wide Web Consortium (www.w3.org). Click to learn more about our commitment to accessibility and standards.

Moving forward with Composr

ocPortal has been relaunched as Composr CMS, which is now in beta. ocPortal 9 will be superseded by Composr 10.

Head over to compo.sr for our new site, and to our migration roadmap. Existing ocPortal member accounts have been mirrored.


Sitemap issue

Login / Search

 [ Join | More ]
 Add topic 
Posted
Rating:
#74245 (In Topic #15456)
Avatar

Community saint

The sitemap created using the force-generation feature includes at least one URL that is unavailable to a guest:

Code

http://www.domain.com/groups/view/super-members.htm

The site map should create entries only for those items available to a guest - which is what the search engines are.

Bob
Back to the top
 
Posted
Rating:
#74248
Avatar

Community saint

Other examples of entries in the sitemap.xml which should not be there:

Code

http://www.domain.com/warnings/ad.htm
http://www.domain.com/bookmarks/misc.htm
http://www.domain.com/forum/vforums/unread.htm
http://www.domain.com/forum/vforums/misc.htm
http://www.domain.com/forum/forumview/pt.htm
http://www.domain.com/personalzone/privacy.htm

All of these require that a user be logged in which the search engines are not.

Bob
Back to the top
 
Posted
Rating:
#74253
Avatar

Google will not index a page given an access denied HTTP header, which ocPortal will give.


Become a fan of ocPortal on Facebook or add me as a friend. Add me on on Twitter.
Was I helpful?
  • If not, please let us know how we can do better (please try and propose any bigger ideas in such a way that they are fundable and scalable).
  • If so, please let others know about ocPortal whenever you see the opportunity.
  • If my reply is too Vulcan or expressed too much in business-strategy terms, and not particularly personal, I apologise. As a company & project maintainer, time is very limited to me, so usually when I write a reply I try and make it generic advice to all readers. I'm also naturally a joined-up thinker, so I always express my thoughts in combined business and technical terms. I recognise not everyone likes that, don't let my Vulcan-thinking stop you enjoying ocPortal on fun personal projects.
  • If my response can inspire a community tutorial, that's a great way of giving back to the project as a user.
Back to the top
 
Posted
Rating:
#74263
Avatar

Community saint

Thanks, Chris, I knew that they would not be indexed. What I am curious about is does having those 404s cause Google to punish you in Page Rank?

Bob
Back to the top
 
Posted
Rating:
#74270
Avatar

Oh, of course not :). Remember any one can link to anything online, so Google gets a tonne of pages it then finds it cannot actually access.

If you are really concerned (you shouldn't be, but if you are :P) then you can set page permissions on them to deny guests access to those at the page level.


Become a fan of ocPortal on Facebook or add me as a friend. Add me on on Twitter.
Was I helpful?
  • If not, please let us know how we can do better (please try and propose any bigger ideas in such a way that they are fundable and scalable).
  • If so, please let others know about ocPortal whenever you see the opportunity.
  • If my reply is too Vulcan or expressed too much in business-strategy terms, and not particularly personal, I apologise. As a company & project maintainer, time is very limited to me, so usually when I write a reply I try and make it generic advice to all readers. I'm also naturally a joined-up thinker, so I always express my thoughts in combined business and technical terms. I recognise not everyone likes that, don't let my Vulcan-thinking stop you enjoying ocPortal on fun personal projects.
  • If my response can inspire a community tutorial, that's a great way of giving back to the project as a user.
Back to the top
 
Posted
Rating:
#74272
Avatar

Community saint

Nah…you reassured me that all is well with Google. No need to make more work for myself. I've been trying to dig out from under a pile of loose ends for the last couple of days but I'm most concerned that the site gets crawled and indexed by the search engines.

I just get worried when I get more clicks from Bing than I do Google. That's never happened to me before.

Bob
Back to the top
 
Posted
Rating:
#74280
Avatar

Community saint

Well, I thought I was reassured and then I noticed this in the Google Webmaster Tools:


I understand that the link will not be indexed so there is no issue that others might click this link but I do wonder if Google penalizes you for having such a link in your sitemap. They certainly seem to be strongly suggesting that you review those entries in your sitemap.

Bob
Back to the top
 
Posted
Rating:
#74284
Avatar

Community saint

I wouldn't read too much into that report, its just informative.

They have no way of knowing what those links are and are just letting what they discovered.

Its no different to your browser reporting a truck load of css errors. Informative: yes, Critical to fix every single one of them: no.

Do you have a Samsung Galaxy S / Galaxy S II ? If so, why not check out my ScreenFree FM Radio .
Back to the top
 
Posted
Rating:
#74285
Avatar

Community saint

Okay….two experts saying the same thing. I am just worrying too much.

Bob
Back to the top
 
Posted
Rating:
#74304
Avatar

Community saint

It's my understanding that a site with a lot of static content will hurt you much more than 404's, and the like, will hurt you.

The internet is dynamic and pages get moved around and content gets changed all the time. Fresh content (along with links to/from popular/simular sites) is what search engines are looking for and missing pages are expected. If you don't let your content stagnate you will be ok.

Steve
Back to the top
 
Posted
Rating:
#74306
Avatar

Community saint

I've got plenty of fresh content here - 30 pictures of paintings and5 binders worth of documentary information that I have barely broached.

I've had 8 registrations in 3 days without the benefit of any Google listings. On the other hand,these people had previously expressed an interest in the artist.

I got a couple of emails complaining about the required birthdate which caused them to not sign up - that issue is now fixed. I also got a few emails asking if people could use their Facebook or Google accounts. Facebook gave me too many problems andI have not yet investigated Google's OpenID implementation. I don't ned additional headaches until I know the site is running smoothly.

I just noticed that I went from something like 8 URLs in the index to 107 so I should be making headway there soon too.

I have several family members and friends who are anxious to add content as well but we were all hoping for an earlier launch. Seems many of them are caught up with back-to-school stuff right now.

I've had an interesting array of guest visitors scattered across a much wider swath thanI had expected including fellow who signed up from Costa Rica where the artist spent 6 months paintings and partying with the locals.

All in all, a reasonably successful launch - especially considering the unavailability of the site after I sent a large batch of invitations. I wonder how many got struck by that. And, the site still requires invitations to join so it is somewhat self-limiting at this point.

Bob



Back to the top
 
Posted
Rating:
#74309
Avatar

Community saint

I browsed through your site a few days ago and am looking forward to seeing more of it when you have it fully "stocked".  :thumbs:

Steve
Back to the top
 
Posted
Rating:
#74325
Avatar

Community saint

Me too!!!!!!

That's why I am so anxious to work out the last of  my issues so I cam turn my attention to scanning more of the images. There is some real eye candy in the next batch although some women might think that it objectifies women.

Bob
Back to the top
 
Posted
Rating:
#75499
Avatar

Community saint

Okay. it's time to revisit this issue.

I noticed the other day that Bing had not crawled my site for 10 days when Google visits all the time. I updated the sitemap and submitted it and it is now labelled as "pending".

But doing some research, I found the following in the Bing Guidelines:
For search engines, we seek a certain type of file, usually referred to as a sitemap.xml. This file resides on your server and is a location where you would showcase every URL from your website. Even if that number stretches into the thousands or more, a sitemap.xml file can hold them and search engine crawlers can easily read the information, uncovering all of your content.

You can maintain these files manually or dynamically, though you should take care in either instance to ensure your sitemaps are clean. Being clean to Bing means less than 1% of the listed URLs return an error when we call on the URL. When we ping a URL, we want to see a 200 OK header response come back. If we see a 301, 302 or 404 header response code returned to our request, those are seen as errors. If we see more than 1% of the URLs in a given sitemap returning errors, we begin to distrust the sitemap and stop visiting it. It is important to keep your sitemaps clean so we will visit them and find your latest content.

So it seems, at least for Bing, that those 401s are penalizing your site from being indexed. This is why it is important to make sure that the sitemap is generated as if it is a guest and thus include no URLs which it has no access to. My ocP-generated sitemap currently has 289 entries and Google is showing 11 401 errors in the sitemap (almost 4 percent and well beyond Bing's guidelines). I suspect that Google also applies some unspoken penalty for these disallowed URLs in a sitemap.

Someone made the point that anyone can link to anything in your site and that is true. But the issue, as Bing is trying to point out, is that as the site owner and sitemap creator, you are held to a higher standard and your sitemap links should be clean.

I am surprised that they hold 301s against the site. That seems wrong to me.

Anyway, I'd be interested in a response based on the information I provided from the Bing webmaster guidelines vis-a-vis the ocP-generated sitemap.

Bob


Last edit: by BobS
Back to the top
 
Posted
Rating:
#75500
Avatar

You can manually remove guest page permissions to those pages that would be implicitly denied it anyway.


Become a fan of ocPortal on Facebook or add me as a friend. Add me on on Twitter.
Was I helpful?
  • If not, please let us know how we can do better (please try and propose any bigger ideas in such a way that they are fundable and scalable).
  • If so, please let others know about ocPortal whenever you see the opportunity.
  • If my reply is too Vulcan or expressed too much in business-strategy terms, and not particularly personal, I apologise. As a company & project maintainer, time is very limited to me, so usually when I write a reply I try and make it generic advice to all readers. I'm also naturally a joined-up thinker, so I always express my thoughts in combined business and technical terms. I recognise not everyone likes that, don't let my Vulcan-thinking stop you enjoying ocPortal on fun personal projects.
  • If my response can inspire a community tutorial, that's a great way of giving back to the project as a user.
Back to the top
 
Posted
Rating:
#75507
Avatar

Community saint

Chris-

I am not sure how this fixes things. Are you saying that the sitemap generator is running as a non-logged user and so removing the permissions will assure that those URLs are not included in the  index?

Has anyone else checked Google and Bing to see what their status is relative to errors from sitemap URLs? In Google Webmaster tools, there is a section in the Crawl error for sitemap errors. For Bing, you will nedd to look at your sitemap feed and see when it was last crawled.

I'd be interested to see if others are having this issue.

Bob
Back to the top
 
Posted
Rating:
#75508
Avatar

Yes


Become a fan of ocPortal on Facebook or add me as a friend. Add me on on Twitter.
Was I helpful?
  • If not, please let us know how we can do better (please try and propose any bigger ideas in such a way that they are fundable and scalable).
  • If so, please let others know about ocPortal whenever you see the opportunity.
  • If my reply is too Vulcan or expressed too much in business-strategy terms, and not particularly personal, I apologise. As a company & project maintainer, time is very limited to me, so usually when I write a reply I try and make it generic advice to all readers. I'm also naturally a joined-up thinker, so I always express my thoughts in combined business and technical terms. I recognise not everyone likes that, don't let my Vulcan-thinking stop you enjoying ocPortal on fun personal projects.
  • If my response can inspire a community tutorial, that's a great way of giving back to the project as a user.
Back to the top
 
Posted
Rating:
#75513
Avatar

Community saint

Chris-

I hate to be a pest but something is not adding up. You are saying that the sitemap runs as a non-logged user and thus would not include links to pages that are inaccessible.

This is what Google Webmaster Tools shows:


I realize the dates are old but I have not changed any permissions.

Google is saying that they are 11 URLs in the sitemap which returned 401 errors and you are saying that those links would not be included in the sitemap. I don't believe there any way you can both be right.

Bob

Back to the top
 
Posted
Rating:
#75515
Avatar

This is why I said change page permissions. The sitemap is not generated by a crawler, it simply enumerates pages and categories, filtering by page/zone/category permissions. If a particular screen applies it's own internal logic regarding how a privilege might be applied then that would not be known in this process.


Become a fan of ocPortal on Facebook or add me as a friend. Add me on on Twitter.
Was I helpful?
  • If not, please let us know how we can do better (please try and propose any bigger ideas in such a way that they are fundable and scalable).
  • If so, please let others know about ocPortal whenever you see the opportunity.
  • If my reply is too Vulcan or expressed too much in business-strategy terms, and not particularly personal, I apologise. As a company & project maintainer, time is very limited to me, so usually when I write a reply I try and make it generic advice to all readers. I'm also naturally a joined-up thinker, so I always express my thoughts in combined business and technical terms. I recognise not everyone likes that, don't let my Vulcan-thinking stop you enjoying ocPortal on fun personal projects.
  • If my response can inspire a community tutorial, that's a great way of giving back to the project as a user.
Back to the top
 
Posted
Rating:
#75516
Avatar

Community saint

Okay, I think I finally understand what you are saying. I removed Guest access to forum Module:forum and those link look like they are no longer included in the sitemap. I just need to hunt down the other permissions for the other errant entries.

Thanks for you help.

Bob
Back to the top
 
There are too many online users to list.
Control functions:

Quick reply   Expand