Problem understanding canonical links

I have researched and read so many ‘basic intros’ and all lose me pretty quickly.

I understand the purpose and basic syntax behind a canonical url and I am using -

<link rel="canonical" href="https://www.website.com/page/" />

Now, I do not plagiarise any content, nor do I spam SEO by duplicate content of my own, so my requirement is to prevent pages being identified as duplicate, when in reality, they are not.

For me, here is the confusion, I have been told that, even after creating a single page from scratch -

  • I could have both HTTP and HTTPS versions
  • I could have both non-WWW and WWW versions
  • I could have both trailing-slash and non-trailing slash URLs

I only create one version of a page, and normally access it by typing in https://MyDomain.com or simply MyDomain.com but apparently search engines or the ‘black magic’ of the internet can create 8 or more ‘virtual versions’ and in fact, I can access it by typing any of the above formats in the browser search.

How do I decide which version of my one file in one physical location should be designated as the original.

Basically I only need to say for each page ‘This is the canonical version’, but what do I put as the parameter. I have also been told that if I am stating ‘This is the page’ I can simply add -

<link rel="canonical" href="https://www.website.com/" />

i.e. without specifying the page name of ‘This page’, is this correct, do I use https, www, and end with a /

I have no idea which page Google, or another engine, considers the original, which to specify and whether to include page name and end with /

It seems such a simple requirement - ‘Hey search engines! This is the main original page!’ But I am lost how to achieve this simply and reliably.

Thanks for ANY clarification

a search engine that reads a canonical link will consider the URL specified in the link as the original. If you’re looking at http://www.example.com/aboutus.html but the canonical link in the header is https://www.example.com/about.html, the search engine will consider the second one to be the original.

If the canonical link points to the URL the engine is currently reading, it will consider that a closed loop and say that it is currently parsing the correct original page.

Thank you for clarifying that, but I am still unsure how to tell which page the engine will be currently reading. I mean I if have a page called about.html on my local machine and I am editing it, how do I know if Google will be reading it as http, https, www, non www, and whether it will add a closing / when I upload it to my server and Google crawls it. Because although I know I am on my page I do not know how Google considers this page to be named / referenced when it crawls it.

It’s up to you to decide which is the canonical URL. That is the idea, that you tell the search bots which URL you want indexing for the page.
Though if you don’t have the same page accessible via different URLs, you possibly don’t even need worry about it.
The canonical link becomes useful with certain sites, usually CMS ones where a page may be reached by a number of different URLs. Then you need to tell the bots that they are all the very same page, not duplicated content, and that this is the URL you want it indexed under.
Eg, imagine an Ecommerce site where the page for a single particular product may be reached by:-

https://www.example.com/acme-widget
https://www.example.com/widgets/acme-widget
https://www.example.com/acme-products/acme-widget
https://www.example.com/all-products/acme-widget
https://www.example.com/?product_id=321
https://www.example.com/?search=widget&p=2&result=13

…You get the idea.
When that’s going on, you need a canonical link.

1 Like

You wont. That’s the point. You can’t know which one. Take Sam’s example there. You’ve got 6 URLs about the acme-widget. They all show the same page. You dont know WHICH URL the engine is reading - but you dont care. As far as the engine is concerned, no matter which one it’s reading, it’s “reading” https://www.example.com/acme-widget (assuming you chose that as your canonical).

OK, thanks, sorry for making it so painful, I am getting the idea, can you explain for me though, please, why none of your example URLs actually seem to point to a page name, I would expect URLs like the above to actually say https://www.example.com/acme-products/acme-widget/widget.html or something like that and end with a page rather than what appears to be a folder?

It really depends upon the site/server set up. A lot of sites like to “prettify” the URL to exclude the file extension.
In fact a lot of server-side driven sites don’t have a whole bunch of files in the root named by the pages (home, about, contact, etc). They may have just one index file, be that .php, .asp or whatever, which has a “router” script which examines the request URL, then dishes out whatever content depending on that URL. Browse a few sites and take note of the URLs to see this.
On a simple, static HTML site where you are just loading whaterver file, eg: widget.html, you are less likely to need the canonical link, because there is only one URL that will lead to that page, which is the route to that file.

Thank you everyone for your patient and informative comments, I think I finally have a grip on it. Cheers guys.

Hi, thanks for taking the time to help, however, I did state in the question

My problem was with identifying any ‘pseudo’ duplicates a browser may create due to different interpretations of what the actual domain is and why page names were often omitted.

However, your reply has prompted me to ask whether there is a ‘generic’ url I could use for a particular domain that would mean ‘this page’ i.e. I think I have seen somewhere that if I specify the canonical as simply, say https://mydomain.com i.e. without a page name it will be interpreted as ‘this page is the canonical’ The reason I ask is that I never have actual duplicate content and I use a php include for common settings, and I could include the canonical here. Just an idea if anyone has any comments. Thanks again.

If you do this it’s possible that only your homepage will be indexed, as all other pages will point to it as their canonical. I can only see this being damaging to SERPs.
As I say, there is a good chance you don’t even need canonical links for your site. But if you do want to add them, do it correclty. So your homepage (only) may be https://example.com, but every other page will have its own unique URL, such as: https://example.com/about.php or https://example.com/contact.php

1 Like

Thanks for info, I will take your advice !

I’m facing a issue regrading canonical tag. I placed somewhere link without the last " / " slash and it causes problem in Google search console. Does anybody know the solution?

That’s a bit vague. Can you explain your problem in a bit more detail?

I guess what you are saying is that you got a warning in Google Search Console that you have a canonical link somewhere that has no final /
If that is the case, then it is a process of elimination, finding where you put it. Without further info, can’t give much advice. I would suggest you check in Google Search Console to see if you can identify the page it is on or the contents of the Canonical that will help you search, maybe using a search utility to identify any files containing that string.
Unfortunately based on the info you have given, that’s it - find it and change it!

Yes absolutely,
My actual URL is https://trackingvrl.in/vrl-travels/ but GSC shows that you’ve another canonical tag like this trackingvrl.in/vrl-travels (without the final " / "). I’ve tried to find it to remove but I couldn’t locate where it is. is there any other way?

Well, if it REALLY is a warning about a canonical ref you should not be using it on many pages, especially since it is your own site and content. In most instances if you add a canonical link it would be on the same page you are pointing to. That should narrow down the search. If you are using the same canonical ref on multiple pages, you are probably using it wrongly and may need to go through and change a number of them manually anyway.

Again make sure that it really is a canonical error and that GSC does not, somewhere, give you more info. It usually does but you have to dig into their reporting and menu options.

You could use a free utility such as Searchmonkey, which I use in situations like this. It allows you to search a folder, or from a folder down, or even a whole drive for all files that contain a certain string of text. So assuming all your pages are in one folder or subfolders of a main folder you can specify a start point, specify the text you are searching for and just sit back and wait while it assembles a list of files containing that text.

You can try searching with a space at the end and without. If there is not a space at the end, you may have to search for all instances and examine them manually. Be a bit clever with your search to avoid other references, page links etc.

If you are using Wordpress or some other CMS with a plugin, maybe the ref has been added automatically somewhere.

You could also look at a free SEO service like AHREFS which will analyse and report on your site and may give you more info.

But as I say, if you are using canonicals correctly you would not normally have that many references to the same page, often only on that page.

One final note - you should consider posting your question with more detail, explanation and screen copies as a separate post, rather than jump in on someone else’s. I don’t mind in the least but you will probably get more responses and avoid upsetting the admins - just a suggetion.

Good luck!

I also had similar issue with my website page even I didn’t post a duplicate content into my website page but in google search console it shown website page canonical link. did you find any solution?

Have you bothered to read the preceding posts?

Actually your problem sounds quite different apart from the fact it refers to a canonical link. In this case the person has created canonical links but cannot find one that Google has apparently found and considers wrong. I had a big issue understanding canonical links but maybe I can explain something as I understand it that may be relevant to your problem -

A canonical link does not mean that you have duplicate content, it is a way of telling Google, if it does find duplicate content, which page should be considered the original.

For this reason all pages should have a canonical link, often pointing to themselves, so Google does not penalise you if it finds content it considers as duplicate and you have not pointed to the original.

Some CMS and SEO plugins add canonical links automatically to help with this.

The solution is, unfortunately, painstakingly examining your system to find where this link has been created or exists and correcting it.