Problem understanding canonical links

I have researched and read so many ‘basic intros’ and all lose me pretty quickly.

I understand the purpose and basic syntax behind a canonical url and I am using -

<link rel="canonical" href="https://www.website.com/page/" />

Now, I do not plagiarise any content, nor do I spam SEO by duplicate content of my own, so my requirement is to prevent pages being identified as duplicate, when in reality, they are not.

For me, here is the confusion, I have been told that, even after creating a single page from scratch -

  • I could have both HTTP and HTTPS versions
  • I could have both non-WWW and WWW versions
  • I could have both trailing-slash and non-trailing slash URLs

I only create one version of a page, and normally access it by typing in https://MyDomain.com or simply MyDomain.com but apparently search engines or the ‘black magic’ of the internet can create 8 or more ‘virtual versions’ and in fact, I can access it by typing any of the above formats in the browser search.

How do I decide which version of my one file in one physical location should be designated as the original.

Basically I only need to say for each page ‘This is the canonical version’, but what do I put as the parameter. I have also been told that if I am stating ‘This is the page’ I can simply add -

<link rel="canonical" href="https://www.website.com/" />

i.e. without specifying the page name of ‘This page’, is this correct, do I use https, www, and end with a /

I have no idea which page Google, or another engine, considers the original, which to specify and whether to include page name and end with /

It seems such a simple requirement - ‘Hey search engines! This is the main original page!’ But I am lost how to achieve this simply and reliably.

Thanks for ANY clarification

a search engine that reads a canonical link will consider the URL specified in the link as the original. If you’re looking at http://www.example.com/aboutus.html but the canonical link in the header is https://www.example.com/about.html, the search engine will consider the second one to be the original.

If the canonical link points to the URL the engine is currently reading, it will consider that a closed loop and say that it is currently parsing the correct original page.

Thank you for clarifying that, but I am still unsure how to tell which page the engine will be currently reading. I mean I if have a page called about.html on my local machine and I am editing it, how do I know if Google will be reading it as http, https, www, non www, and whether it will add a closing / when I upload it to my server and Google crawls it. Because although I know I am on my page I do not know how Google considers this page to be named / referenced when it crawls it.

It’s up to you to decide which is the canonical URL. That is the idea, that you tell the search bots which URL you want indexing for the page.
Though if you don’t have the same page accessible via different URLs, you possibly don’t even need worry about it.
The canonical link becomes useful with certain sites, usually CMS ones where a page may be reached by a number of different URLs. Then you need to tell the bots that they are all the very same page, not duplicated content, and that this is the URL you want it indexed under.
Eg, imagine an Ecommerce site where the page for a single particular product may be reached by:-

https://www.example.com/acme-widget
https://www.example.com/widgets/acme-widget
https://www.example.com/acme-products/acme-widget
https://www.example.com/all-products/acme-widget
https://www.example.com/?product_id=321
https://www.example.com/?search=widget&p=2&result=13

…You get the idea.
When that’s going on, you need a canonical link.

1 Like

You wont. That’s the point. You can’t know which one. Take Sam’s example there. You’ve got 6 URLs about the acme-widget. They all show the same page. You dont know WHICH URL the engine is reading - but you dont care. As far as the engine is concerned, no matter which one it’s reading, it’s “reading” https://www.example.com/acme-widget (assuming you chose that as your canonical).

OK, thanks, sorry for making it so painful, I am getting the idea, can you explain for me though, please, why none of your example URLs actually seem to point to a page name, I would expect URLs like the above to actually say https://www.example.com/acme-products/acme-widget/widget.html or something like that and end with a page rather than what appears to be a folder?

It really depends upon the site/server set up. A lot of sites like to “prettify” the URL to exclude the file extension.
In fact a lot of server-side driven sites don’t have a whole bunch of files in the root named by the pages (home, about, contact, etc). They may have just one index file, be that .php, .asp or whatever, which has a “router” script which examines the request URL, then dishes out whatever content depending on that URL. Browse a few sites and take note of the URLs to see this.
On a simple, static HTML site where you are just loading whaterver file, eg: widget.html, you are less likely to need the canonical link, because there is only one URL that will lead to that page, which is the route to that file.

Thank you everyone for your patient and informative comments, I think I finally have a grip on it. Cheers guys.

In the context of web pages, a canonical link is a tag that helps to indicate the preferred or “canonical” version of a page to search engines. This can be useful in cases where there are multiple versions of the same content, such as a product page with different sorting options or pagination, or a blog post that appears on multiple pages.

The canonical link tag is added to the head section of a web page and specifies the URL of the preferred version of the page. When search engines encounter this tag, they understand that the specified URL is the original or primary version of the content, and any other versions are duplicates or variations.

By indicating the canonical URL, the tag helps to avoid potential issues with duplicate content that could negatively impact a website’s search engine rankings. It also helps to ensure that the correct version of the content is indexed and displayed in search results.

I hope this helps you understand what canonical links are and how they work!

Hi, thanks for taking the time to help, however, I did state in the question

My problem was with identifying any ‘pseudo’ duplicates a browser may create due to different interpretations of what the actual domain is and why page names were often omitted.

However, your reply has prompted me to ask whether there is a ‘generic’ url I could use for a particular domain that would mean ‘this page’ i.e. I think I have seen somewhere that if I specify the canonical as simply, say https://mydomain.com i.e. without a page name it will be interpreted as ‘this page is the canonical’ The reason I ask is that I never have actual duplicate content and I use a php include for common settings, and I could include the canonical here. Just an idea if anyone has any comments. Thanks again.

If you do this it’s possible that only your homepage will be indexed, as all other pages will point to it as their canonical. I can only see this being damaging to SERPs.
As I say, there is a good chance you don’t even need canonical links for your site. But if you do want to add them, do it correclty. So your homepage (only) may be https://example.com, but every other page will have its own unique URL, such as: https://example.com/about.php or https://example.com/contact.php

Thanks for info, I will take your advice !