HTML Logo by World Wide Web Consortium (www.w3.org). Click to learn more about our commitment to accessibility and standards.

Moving forward with Composr

ocPortal has been relaunched as Composr CMS, which is now in beta. ocPortal 9 will be superseded by Composr 10.

Head over to compo.sr for our new site, and to our migration roadmap. Existing ocPortal member accounts have been mirrored.


Google "URL parameters"

Login / Search

 [ Join | More ]
 Add topic 
Posted
Rating:
#74629 (In Topic #15533)
Avatar

Community saint

I've known about Google Webmaster Tools' ability to set URL parameters for query strings that you want excluded but I just stumbled onto the fact that it is not creating its own URL parameters. The following is a list of parameters Google has decided need to be there:


You will note that there is no information regarding how Google is handling these parameters. Would any of these parameters result in content not being displayed in Google's index (assuming that is how they are treating these parameters).

I ask because according to Google they have indexed 178 of the 237 entries in my sitemap. I know that there is often a delay but it's been nearly a week and not a single link shows up in Google search - not even for the homepage. I don't remember ever waiting more than a couple of days for the homepage to be listed in search results.

Of course, the fact the Google Analytics seems to not want to report any data on my traffic further concerns me (although I see several posts in their GA forums with people having the same problem, including people for whom it previously worked).

How long did it take for Google to put your home page in the index? Should I be concerned that there is some problem? I have never had Bing list a site before Google but that is now the case.

Thanks for any input.

Bob
Back to the top
 
Posted
Rating:
#74960
Avatar

Community saint

My issue of being indexed in Google has been resolved.

I am just curious what others are going about the URL parameters that Google is deciding need to be addressed. Do you leave them at the default "Let Google decide" or are you changing them?

My interest in this is renewed because I noticed yesterday that Google added probe_id and probe_type - the are both used in the URLs for gallery  items displayed in flow-mode. I have no idea if Google is including or excluding the URL parameters.

Bob
Back to the top
 
Posted
Rating:
#74961
Avatar

Pretty innocent stuff. I think those are all relating to the search module, and search engines are limited in what they are allowed to do there anyway.


Become a fan of ocPortal on Facebook or add me as a friend. Add me on on Twitter.
Was I helpful?
  • If not, please let us know how we can do better (please try and propose any bigger ideas in such a way that they are fundable and scalable).
  • If so, please let others know about ocPortal whenever you see the opportunity.
  • If my reply is too Vulcan or expressed too much in business-strategy terms, and not particularly personal, I apologise. As a company & project maintainer, time is very limited to me, so usually when I write a reply I try and make it generic advice to all readers. I'm also naturally a joined-up thinker, so I always express my thoughts in combined business and technical terms. I recognise not everyone likes that, don't let my Vulcan-thinking stop you enjoying ocPortal on fun personal projects.
  • If my response can inspire a community tutorial, that's a great way of giving back to the project as a user.
Back to the top
 
Posted
Rating:
#74962
Avatar

My interest in this is renewed because I noticed yesterday that Google added probe_id and probe_type - the are both used in the URLs for gallery  items displayed in flow-mode. I have no idea if Google is including or excluding the URL parameters.

Bit of a toss up. Flow-mode gallery is like this, each image is kind of in the same viewer, so whether you want Google to index them separately or just index the gallery as a single unit, it's kind of up to you.


Become a fan of ocPortal on Facebook or add me as a friend. Add me on on Twitter.
Was I helpful?
  • If not, please let us know how we can do better (please try and propose any bigger ideas in such a way that they are fundable and scalable).
  • If so, please let others know about ocPortal whenever you see the opportunity.
  • If my reply is too Vulcan or expressed too much in business-strategy terms, and not particularly personal, I apologise. As a company & project maintainer, time is very limited to me, so usually when I write a reply I try and make it generic advice to all readers. I'm also naturally a joined-up thinker, so I always express my thoughts in combined business and technical terms. I recognise not everyone likes that, don't let my Vulcan-thinking stop you enjoying ocPortal on fun personal projects.
  • If my response can inspire a community tutorial, that's a great way of giving back to the project as a user.
Back to the top
 
Posted
Rating:
#74964
Avatar

Community saint

Chris Graham said

Bit of a toss up. Flow-mode gallery is like this, each image is kind of in the same viewer, so whether you want Google to index them separately or just index the gallery as a single unit, it's kind of up to you.
Hadn't thought of it that way but that does make sense. I guess I just have this thing about Google now attempting to be sentient and not being clear about what action they are taking for any given parameter.

The list has about doubled from the one I originally posted and I think most the parameters are harmless but am just not sure.

Bob
Back to the top
 
Posted
Rating:
#75185
Avatar

Community saint

Chris Graham said

Pretty innocent stuff. I think those are all relating to the search module, and search engines are limited in what they are allowed to do there anyway.
Chris-

Here is the current list. You can see that I have excluded wide_print which is something you will be addressing in the next update along with the "Recommend to a friend" action.


It's really difficult to know what to do since Google doesn't state its treatment for any parameter. Would it be safe to exclude URLs with "only_search_meta", "all_defaults" and "keep_has_js"? Also, it seems I should be able to exclude anything with search.

I am clearly over-indexed due to multiple URLs pointing to the same content. I know that Google is not going to ding me for multiple-content but it is spreading PR over multiple URLs. I'd prefer to have a lower number of high-ranking URLs than to have an enormous number of low-ranking URLs in the index.

Bob


Last edit: by BobS
Back to the top
 
Posted
Rating:
#75211
Avatar

Community saint

So I take it from the lack of response that everyone just trusts Google to do the right thing with these URLs.

Has no one else excluded any URL parameters which point to the same content? Which parameters did (would) you exclude?

Thanks for any input.

Bob
Back to the top
 
Posted
Rating:
#75289
Avatar

Yeah, we trust Google. They even say themselves not to worry about dead/old links, as Google's algorithms are clever enough to drop the offending pages out of its index over time.


Like ocPortal on Facebook:
Back to the top
 
Posted
Rating:
#75295
Avatar

Community saint

Robbie-

I'm not concerned with dead/old links. The problem is that Google is indexing things through multiple paths thus causing PR dilution. One example that Chris said he is going to fix is pages being linked not only through their page URL but also the same URL with either the "wide_print".

Ideally, you want a single path to a page indexed in Google. I figure I should have between 80 and 90 pages indexed but I have 342 URLs listed (which is down from 381 yesterday because I have manually set that Google should ignore URLs with the "wide_print" parameter). I'm just wondering what other parameters are safe to set so that Google doesn't index them.

Here's an example:

Code

Showgirls - JulianRitterCentral
/catalogues/category/image-gallery/showgirls.htm
/catalogues/category/image-gallery/showgirls.htm?catalogue=image_gallery

It appears to me that the URL with the catalog  parameter should be no-follow,no-index.

EDIT:
Here's another example showing the the "wide_print" parameter:

Code

Why "JulianRitterCentral"? - JulianRitterCentral
/news/view/bobss-blog/why-julianrittercentral_2.htm
/news/view/bobss-blog/why-julianrittercentral_2.htm?filter=1
/news/view/bobss-blog/why-julianrittercentral_2.htm?wide_print=1&max=1000

I am not sure why the URL with "wide_print" is still in the index as I have set this parameter to not index those URLs. Probably just going to take some time for Google to catchup.

Bob


Last edit: by BobS
Back to the top
 
Posted
Rating:
#75360
Avatar

It's very normal for signals to be used in URL parameters, they're a very normal part of the web, and Google can easily detect when they make very little change to how the page looks and therefore pick just one 'version' of the URL to use (I would guess either the most linked to, or simplest URL, or some combination of those measures). In fact, the fact proper URL parameters are used is good- if they were folded into the main path part of the URL, the risk of creating the problem you are worried about would be much higher. So really this is just a discussion of whether you should bother giving extra info to Google; which I really don't think is worth bothering even thinking about to be honest.

I bet Google either ignore "duplicate URLs", "canonicalise" URLs as part of their core page rank algorithm, or have some kind of equivalency sets that get fed through the formulae. Or put it another way – I bet they do something smart in their algorithm.

Regardless, it's a small problem you're worrying about…

I doubt it could 'dilute' page rank if there are ways for a crawler to find URL parameters for a page – if anything, it's more pages linking to your other pages, so higher page rank. I understand the point that if a high ranked page, X, has more links than necessary, it's links to other pages will be supplied less individual 'pagerank juice', but really we're talking such small statistically numbers, of just one part (page rank) of a much more complex algorithm, it's really trivial stuff. And besides, your internal cross linking is again a triviality compared to the benefits of getting external links (unless you expect high page rank across a lot of different pages on your site, but want to use that to boost page rank of some pages that aren't externally linked to - which I think is unlikely;)).

I've looked at the parameters here. The majority are to set up what default settings are on the search form or what version of the search form to use (including, I think, 'catalogue').
'back' is used to guide navigation through the calendar
'title' will probably come from it browsing through some forms
Quite a few also come from flow-mode galleries, as previously discussed.

I didn't realise this post was going to be so long. I found myself having to think through quite carefully, I've never considered this in such detail, but I think honestly I wouldn't ever have needed to, as Google is built to do sane things and promote good content to the top. Tweaking minutiae is unlikely to have much effect with what Google is really looking at IMO (appropriate content, and links from important external sites).


Become a fan of ocPortal on Facebook or add me as a friend. Add me on on Twitter.
Was I helpful?
  • If not, please let us know how we can do better (please try and propose any bigger ideas in such a way that they are fundable and scalable).
  • If so, please let others know about ocPortal whenever you see the opportunity.
  • If my reply is too Vulcan or expressed too much in business-strategy terms, and not particularly personal, I apologise. As a company & project maintainer, time is very limited to me, so usually when I write a reply I try and make it generic advice to all readers. I'm also naturally a joined-up thinker, so I always express my thoughts in combined business and technical terms. I recognise not everyone likes that, don't let my Vulcan-thinking stop you enjoying ocPortal on fun personal projects.
  • If my response can inspire a community tutorial, that's a great way of giving back to the project as a user.
Back to the top
 
Posted
Rating:
#75389
Avatar

Community saint

Chris-

Strangely enough, I did an analysis last night using a third-party sitemap generator and your response is right on the money.

I knew that Google was making some intelligent decisions but I was bothered that, for instance, the ocPortal-generated sitemap has 274 URLs but a "site" query on Google return 408 URLs.

Well, my third-party-generated sitemap has 80+ thousand with all the internal cross-linking (mostly due to the tag cloud I think).

Anyway, I kind of decided everything is okay this  morning and your thoughtful answer further encourages me.

Thanks for taking the time to provide your analysis.

Bob
Back to the top
 
Posted
Rating:
#75449
Avatar

Community saint

Well, it turns out Google is not making as many intelligent choices as it needs to.

Under the "Duplicate title tags" section of the HTML Suggestions, I have the following:

Code

Tropical landscapes from Puntarenas, Costa Rica - JulianRitterCentral
/catalogues/category/image-gallery/tropical-landscapes/tropical-landscapes_2.htm?catalogue=image_gallery
/catalogues/category/image-gallery/tropical-landscapes/tropical-landscapes_2.htm?order=42+ASC&catalogue=image_gallery
/catalogues/category/image-gallery/tropical-landscapes/tropical-landscapes_2.htm?order=42+DESC&catalogue=image_gallery
/catalogues/category/image-gallery/tropical-landscapes/tropical-landscapes_2.htm?order=44+ASC&catalogue=image_gallery
/catalogues/category/image-gallery/tropical-landscapes/tropical-landscapes_2.htm?order=44+DESC&catalogue=image_gallery
/catalogues/category/image-gallery/tropical-landscapes/tropical-landscapes_2.htm?order=47+DESC&catalogue=image_gallery
/catalogues/category/image-gallery/tropical-landscapes/tropical-landscapes_2.htm?order=48+ASC&catalogue=image_gallery
/catalogues/category/image-gallery/tropical-landscapes/tropical-landscapes_2.htm?order=48+DESC&catalogue=image_gallery
/catalogues/category/image-gallery/tropical-landscapes/tropical-landscapes_2.htm?order=54+DESC&catalogue=image_gallery

So, yes, I do have 9 duplicate titles because Google has seen fit to index the same page 9 times with slightly different URL parameters. I'm not sure what "order=" is but the rest is clearly just sorting differences. So it seems that Google needs to be told to ignore the sorting parameters.

Bob
Back to the top
 
Posted
Rating:
#75461
Avatar

It's not surprising sorting is considered different as it would reshuffle the page contents significantly beyond recognition. But I still don't think there is an actual SEO problem here.


Become a fan of ocPortal on Facebook or add me as a friend. Add me on on Twitter.
Was I helpful?
  • If not, please let us know how we can do better (please try and propose any bigger ideas in such a way that they are fundable and scalable).
  • If so, please let others know about ocPortal whenever you see the opportunity.
  • If my reply is too Vulcan or expressed too much in business-strategy terms, and not particularly personal, I apologise. As a company & project maintainer, time is very limited to me, so usually when I write a reply I try and make it generic advice to all readers. I'm also naturally a joined-up thinker, so I always express my thoughts in combined business and technical terms. I recognise not everyone likes that, don't let my Vulcan-thinking stop you enjoying ocPortal on fun personal projects.
  • If my response can inspire a community tutorial, that's a great way of giving back to the project as a user.
Back to the top
 
Posted
Rating:
#75464
Avatar

Community saint

Chris-

That's what is so frustrating. One the one hand, Google is identifying pages as having HTML problems which, to me, implies that they are taking these pages into account for indexing and PR. On the other hand, they could just be pointing out the issues and not even indexing the pages. If they would provide a clue as to how they are handling the pages, it would be much better.

I also wish that Google provided a means to download every page indexed in the site so i could figure out whether or not these URLs are indexed. If they are indexed, they are splitting PR.

I think I may have a tweak with their URL parameter tool and tell it to ignore URLs with the "order" parameter. But I did that with the "wide_print" parameter and still don't see those URLs dropping from the HTML suggestions pages.

I am sure I am over=thinking this but I do want to make sure the site is well-indexed with good page rank. I they to spend some time every day doing off-site stuff too like finding places to link back to the site.

Anyway, you comments are always reassuring at least until I see Google throwing up more problems with  their HTML suggestions.

Bob

Back to the top
 
Posted
Rating:
#75710
Avatar

Community saint

Well, this is getting interesting.

This is what Google has to say about their HTML suggestions:
When Googlebot crawled your site, it found some issues with your content. These issues won't prevent your site from appearing in Google search results, but addressing them may help your site's user experience and performance.

This is true but I think the duplicate meta descriptions and title tags do impose penalties. When my duplicates mushroomed yesterday from the 100+ for each of the above to over 500+ for each of the above, I dropped from third in the search results to tenth. Coincidence? Maybe, but I have a feeling not. If anything, I should be moving up in SERPs because I am the only site on the first page adding new content and inbound links.

This is most frustrating because Google is being very slow to remove the many duplicates which I have identified in URL parameters and told Google to ignore.

So, this is not an ocPortal issue but Chris might want to make a note that it would be a good idea to provide a way for each page to generate its own title tags and meta descriptions.

Bob
Back to the top
 
Posted
Rating:
#75715
Avatar

Fan in action

Good show Bob! This is what I had found also. IMO it is minute but none the less. Little things like this bug me also, but in the scope of things they are easily forgotten until your working on your SEO the next time.

I adhere to the cardinals of Google and SEO, but really have started to focus on the human input aspects while fishing for viral visibility when it comes to Google and SEO, hoping for the top 7 mark within the called upon search page. Can't say it has worked any better for me, but sure is better than focusing on the wizardry behind the curtain of Google. Plus it gives me a sense of control. I scope my sites as a one big question related to what the site about and then answer these questions from within my site. Little more work, but ocPortal sets a great base for doing such a thing.

You start digging into how Google looks at your site, you start seeing how dinky you are lol! Then you start digging further and days later you start back right from where you were, hopefully with something to show for it! :lol:

"You Can't Always Get What You Want"
Mick Jagger, Rolling Stones: 1969~Let It Bleed Album 
Back to the top
 
Posted
Rating:
#75718
Avatar

Community saint

I've never really given much attention what Google is doing on my sites but that's because I have always ranked well in the search results at least for my main keywords. Some might argue that slot 10 for a site that is just over a month old is pretty good and it would be if the sites ahead of me were adding content but, in my case, the sites listed above me have been static some for over 6 months. I add new relevant content every single day. Google has still not picked up my in-bound links which does give some of those listed above me an advantage.

To put this into perspective, I have 229 links in the index out of the 280 submitted. But I have a total of 1020 links indexed including 169 "Recommend us" and  297 "Print this" links from their inclusion in the action bar located on each content page. These "Recommend" and "Print" links are meaningless since not many people will be searching Google for them and should be "NO-FOLLOW, NO-INDEX" in the code. This would greatly reduce the chaff in the index and probably increase page rank on the remaining pages (remember that these entries also appear in Google's HYML suggestions in both the "Duplicate meta descriptions" and "Duplicate title tags" sections). I believe Chris said that he would change the items in the action bar to "NO-FOLLOW, NO-INDEX" in 7.2 so things should get better going forward.

The second sore spot for me is that I have explicitly excluded these URLs using Google's "URL parameters" and have seen little in the way of results. I suspect that Google only pulls them from the index when they recrawl the offending URL so I have taken the hit to my host and increased the crawl rate for Google hoping to clear those up sooner than later. From my perspective, when people make changes to the "URL parameters" Google should take the seconds involved to immediately update their index, crawl errors and HTML suggestions so that webmaster know where they actually stand. Unfortunately, Google is the overlord and does what it wants even when it makes no sense (at least to me).

One thing which would help immensely would be for both Google and Bing to allow you to download your index links however that is unlikely as it would be a useful tool for your competitors.

Bottom line is I dropped from the #3 spot to #10 on the same day that all these duplicate meta descriptions and title tags were added and I don't think it was a coincidence. I suspect it is simply page rank dilution at play but it is something beyond my control )unless I want to manually remove hundreds of URLs at both Google and Bing. My real point is that the developers should take great care to minimize the likelihood of this happening with effective use of "NO-FOLLOW NO-INDEX".

Bob

Back to the top
 
Posted
Rating:
#75909
Avatar

Community saint

So I have dug a little deeper into this with rather sad results. The following is entered in the "Duplicate title tags" for one page:


I searched Google individually for each of the URLs and each of them except one is in the index. That means that there are 15 URLs pointing to and sharing the page rank of a single page. Discounting the 4 links for "wide_print" which Chris said will be addressed along with the "Recommend us" link, that leaves 11 URLs splitting that page rank, not to mention that they potentially cause confusion for the Google searcher.

Yes, the different sorts will render the page in different ways but ultimately you are presenting the same dataset repeatedly. Google, through their "URL parameters" allows you to set sort functions so that no URLs will be included. There are two problems with this. First, I have to know with a great deal of certainty that, for instance in this case, excluding URLs with the order parameter will not exclude other URLs that are not duplicated. Secondly, Google's settings show no evidence of working as I have excluded URLs with the "wide_print" parameter on 9/15 and they are still included in the index. Perhaps this is just a matter of Google working at its own pace to cleanse the index but, to me, it doesn't really matter because it should be a developer issue.

From my perspective, the proper solution is that the sort parameter and probably a  number of others should be "NO-FOLLOW" or "NO-INDEX" in the code. This would prevent these needless duplicates from being indexed in the first place and would assure that page rank is not split and would make ocPortal a better product in the sense that it provides top-notch SEO.

Sorry to be a downer, but I think the cavalier attitude exhibited thus far is misguided. The developers can fix this. It need not happen all at once but they could pick away at it so that the issues would not exist in some future version of ocPortal.

My 2.5 cents.

Bob
Back to the top
 
Posted
Rating:
#75919
Avatar

I have made lots of changes now so that various parameters such as 'order' are removed from the 'canonical' URL given in <head>. That means if search engines find those alternate URLs then they will save them under the given canonical URLs instead / defer to those URLs. That should tidy up things a lot. It is not done site-wide – it's neatly done in the areas of the code that use those parameters.

I have also set the meta description to use the one derived from content (the one that went into the dc.description header) if none was explicitly set for a page. It now only falls back to the site-wide one if neither of those exist.

I don't think it's fair to call the attitude cavalier. We rely on accurate information to work to: we don't have the time to do original research, and we're not SEO experts, and Google are always making claims that webmasters just need to be sensible and that they're algorithms will pick stuff up right. But you're bringing new information to my attention and I really appreciate that, I think this is allowing us to make things better. I still don't think page rank actually is "split" though, because nobody is linking to these alternate versions of your pages.


Become a fan of ocPortal on Facebook or add me as a friend. Add me on on Twitter.
Was I helpful?
  • If not, please let us know how we can do better (please try and propose any bigger ideas in such a way that they are fundable and scalable).
  • If so, please let others know about ocPortal whenever you see the opportunity.
  • If my reply is too Vulcan or expressed too much in business-strategy terms, and not particularly personal, I apologise. As a company & project maintainer, time is very limited to me, so usually when I write a reply I try and make it generic advice to all readers. I'm also naturally a joined-up thinker, so I always express my thoughts in combined business and technical terms. I recognise not everyone likes that, don't let my Vulcan-thinking stop you enjoying ocPortal on fun personal projects.
  • If my response can inspire a community tutorial, that's a great way of giving back to the project as a user.
Back to the top
 
Posted
Rating:
#75922
Avatar

Community saint

Chris-

Cavalier was perhaps not the right word and I agree that I should have presented my argument better which was the reason for last night's effort.

In terms of splitting the page rank when multiple URLs point to the same page, everything I've read including material on Google says that they do that  unless one of the URLs is canonicalized, so your efforts should provide even better SEO for ocPortal.

I appreciate your efforts to get both these issues sorted and I think that, as usual, you were a step ahead once the issue was mentioned. It serves as a reminder to everyone that it is easier for you to act when people have provided ready research so that you both understand the issue better and can focus on resolving the issue instead of researching it as well.

Thanks for your efforts and 7.2 sounds like it just keeps getting better and better. I'm a short-timer here but this looks to me to be the most feature=packed update in my short time here.

Bob
Back to the top
 
There are too many online users to list.
Control functions:

Quick reply   Contract

Your name:
Your message: