I am web admin for a site and have a question regarding SEO and pages/urls being indexed by google. The site I am working with was initially developed by an external company and could be viewed at a domain they provided, for example:
Now the site is live and is using a new registered domain: yoursite.com.au
I have been tracking the site with both google webmaster and analytics and have noticed on google that it has indexed pages from both domains, predominantly from the web development companies domain:
You could redirect the user from the development domain to the live site.
You can block Google and all major search engine crawlers to index the development domain by adding a robots.txt file to the document root. The file should contain the following:
It depends on how the hosting is set up as to whether that’s a good idea, but I don’t think it’s the best solution in this case.
If the content is hosted on the yoursite.com.au server then you should redirect (301) all requests from yourwebsite.webcompany.com to the correct domain, which you can do with mod_rewrite or mod_alias in .htaccess (see the Apache forum for more info), and that will ensure that everyone gets taken to the correct page, which is particularly important if the files are duplicated on each server, because then there’s a danger that the development site gets out of sync with the real site. Google will in time update is index, but in the meantime it doesn’t matter because the old links will still give them the right URL.
If the content is hosted on the yourwebsite.webcompany.com domain but not yoursite.com.au then the best plan is to put a <link rel=“canonical” href=“[noparse]http://yoursite.com.au/whatever[/noparse]”> tag in the <head> of every page. That tells Google the URL that you want it to reference that page with, and as long as the URL does return the same (or a similar) page, it will in time update its index.
(You can also use the canonical trick alongside mod_rewrite/mod_alias, which may help speed up the reindexing)
Thanks guys, to my knowledge the files are on a single server with both domains pointing to it, so if I make a change to say the contact page it is updated straight away when viewed at either: yourwebsite.webcompany.com/contact or yoursite.com.au/contact
Because of this I would be worried in using the robots.txt method incase I stopped the site from being indexed completely. Stevie D, would your method of adding <link rel=“canonical” href=“http://yoursite.com.au/whatever”> tag in the <head> of every page work? If so am I correct in thinking I would need to manually change the href attribute for each page? As the site is CMS/template driven could it make this a little trickier to do?
Quite right - if you use robots.txt to tell bots to ignore one route to the site, it doesn’t follow that they will quickly find the other route, and when they do it will be a new start, they won’t transfer any rankings or ratings they already have.
Yes, the href attribute in the canonical tag needs to be different for each page. I would have hoped that a good CMS would be able to separate out the domain and the filepath/name, and then append that filepath/name to a different domain. But I have no idea how to go about that…
Thanks, I will have to look into how to do this with the CMS. When creating a page it allows me to set the url, so hopefully I can then edit the page template to include code similar to the following:
<link rel=“canonical” href=“{page_url}”> where it will generate this autmomatically each time I create a new page.
If I set this for all of the pages, at what point will google then start to re-index/update with the correct urls? (the goal being that it will no longer list urls as: yourwebsite.webcompany.com/whatever)
Thanks for your help so far, I am new to the SEO side of things so finding things a little difficult.
That isn’t possible. ShEpSy has already explained:
to my knowledge the files are on a single server with both domains pointing to it, so if I make a change to say the contact page it is updated straight away when viewed at either: yourwebsite.webcompany.com/contact or yoursite.com.au/contact
Yeh mapetshi, as Technobear mentioned this would not work as there is no development page/site. The files are on a single server with 2 domains pointing to it, my chosen domain name and also a domain/url of the development company. What is happening is that google is mainly indexing the urls from the development company instead of urls from my domain.
I am currently waiting on some help from a support team on how I can dynamically add <link rel=“canonical” href=“{page_url}”> to the CMS template so that each page is updated.
OK, so after some waiting the support team have got back to me on how to alter the CMS templates to dynamically add the url for each page. The solution they provided was not 100% what I was looking for as it does not work for sub level pages, however it is a step forward and I have been able to apply it so that each top level page now contains the following code: <link rel=“canonical” href=“http://yoursite.com.au/whatever”>
Hopefully now overtime Google will start to reference each page with my preferred URL instead of indexing the development company urls for pages. Am I correct in thinking that I now just have to wait for this to happen?