I handcoded and created a site for a client which is now live, however at the moment when viewed online each page URL is listed with its .html extension. So:
(The home page is the only page which does not do this.)
If I am wanting to remove/mask the .html extension how will I go about doing this? I read online that it can be achieved by creating a directory structure for the site. By creating a folder for each page and then placing each page in the relevant folder and changing the filename to index.html
Is this the correct method to use?
Secondly, if I do this how will it effect google indexing? As the site has already been indexed.
I read online that it can be achieved by creating a directory structure for the site. By creating a folder for each page and then placing each page in the relevant folder and changing the filename to index.html
Yes, that’s my understanding as well. Each directory has a default page. The name of the page depends on the type of server and on certain settings, but it is usually either index.html or index.htm. So if the visitor just goes to the directory, they will see that page. (At least, that’s what I’ve always understood. Someone will corect me if I’ve got that wrong.)
Secondly, if I do this how will it effect google indexing? As the site has already been indexed.
Google (and other search engines) will have indexed the page under whatever URL you originally gave it, so if a visitor clicks through from the search engine results, they will go to the old page rather than index.htm.
The way to deal with that is to set up a 301 redirect from the old page to the new. Your server documentation will tell you how to do that, or you could ask in one of the forum sections that deals with those issues. When you redirect in this way, you’ll continue to benefit from any ranking you’ve already achieved in the search engines.
to your .htaccess file. This allows the user-agent to omit the file extensions altogether, and the server will try to find the best match available. Note - this will only work properly if you don’t have any duplicate file/folder names within a directory. So if you have both “product.html” and “product.php”, or both “product.html” and “product/”, in the same directory, it will have to guess which one to return (folders get priority over files).
To avoid any problems of duplicate content, I would recommend using <link canonical> tags to give Google your preferred form of the URL, just in case people are linking to different forms.
Thanks for clearing that up. Is there any real advantage to having a site organised this way? I just feel that when accessing a site it looks better being displayed as: www.site.com/contact as opposed to every page displaying the .html extension
I know that when I previously changed the name of a page, it didnt seem too long for google to update and display the new url in search results instead. Although when I log into webmaster the old page has generated a crawl error. In this instance instead of renaming the file, should I have kept the original file on the server still and then used a 301 to redirect to the new page?
If there is no real benefit then I think I will just leave the site as is, as it is starting to rank in google and I don’t want to interfere with this / risk having to start over! Although when I launch my personal portfolio site I may opt for this approach instead of having all of the files within a single folder.
Thanks, I may have to do some reading on .htaccess as it seems to be getting mentioned here and there.
Thanks, Im still looking into using these tags in regard to my other seo query which you replied to
As mentioned above when I wanted to change the file name of a page I deleted it, and then replaced it with the new page name (i then changed the links on each page to link to the new page). In this instance should I have left it where it was, put the new file on the server and then redirected from the old page to the new page? Also including a <link canonical> to tell google which was the preferred url?
You should be able to rename a file on the server, rather than having to delete and re-upload it. Just make sure you change the name on your master copy at the same time!
If you put a redirect on a URL then it makes no difference whether there is a file there with the name you are redirecting from, because the server will always follow the redirect and will ignore the file with that (old) name.
Having seen Stevie’s reply, I realise that his suggestion is better than mine - provided you don’t have any clashes in the existing filenames.
My only minor caution is that not all web servers support .htaccess. At least, I think that’s right. Or, if it is widely supported, it might not be made available by the web hosting company. (Stevie, correct me if I’m wrong about that.)
That’s right – some free and cheap hosts don’t allow users to control the .htaccess file, but may allow you scope for setting up redirects via the control panel.
If you put a redirect on a URL then it makes no difference whether there is a file there with the name you are redirecting from, because the server will always follow the redirect and will ignore the file with that (old) name.
Will google will no longer index the file with the old name and update it’s index with the new page instead?
Hopefully that’ll be the case. NEw to the SEO so loads of stuff to take in! With the site I am going to just leave it with the .html extensions, however when it comes to doing my own site I will opt for setting up the directory so that each page has its own folder with a index file.
That’s more work and potentially more confusing for you in terms of maintaining the site - if I had a folder containing just an index.htm for every page, I think I would go crazy!
To be honest, the extension really doesn’t make a whole lot of difference. If you can get it to use no extension or just .htm then that’s great (they’re easier for people to type) but other extensions are not a big deal. Search engines couldn’t care less what extension you use, as long as the URLs work and take people to the right page, and you make sure that you don’t end up with multiple forms of the same URL being indexed separately.
ha yeh I was kind of thinking that, I trialled it the other day and when working in notepad++ on the html files it was a little confusing having multiple index.html files open!
Are there any other ways to use no extension, I just think it looks better and it will be a lot easier for a user to type a url without an extension.
Thanks miki i will look to use this method. Although I do have a contact.html and a contact.php (to process a form) so in order for it work properly im guessing I would have to rename contact.php (also change the <form action> attribute to match new filename) so there are no duplicate file names.
As i have no previous experience working with .htaccess i think for this site i will leave it as it is and then try this method when i deploy my site, although i think i am going to use wordpress for my portfolio site.
If I add this to the .htaccess file will it have any impact on SEO? The reason I am asking is that the sites rankings are slowly improving and I don’t want to risk affecting this. I am hoping that adding this code will just mean that a user can type the address of a page directly into their browser without the need for .html At the moment if this is done a 404 is displayed.
As long as you use a canonical tag in the <head> of each file, it should have no negative impact on your search rankings. By using the canonical tag, you avoid the problem of duplicate content, because the search engines then have one single URL to index the page with, no matter what version of the URL they went looking for.
Thanks Stevie, I may give this a go on my clients web site. {Add Options +MultiViews to the .htaccess file and then add canonical tags to each page.}
With regards to the canonical tags should I include .html as part of the href attribute? As whenever my site is listed in Google it always lists each page with .html as part of the link, other than than the homepage which simply displays www. mysite . com
For the canonical tag, use whatever format of the URL you want indexed. If you want people to be directed to example.com/page rather than example.com/page.html then that’s absolutely fine, and if you’ve gone to the effort of enabling MultiViews, it would seem sensible to encourage people to use the “friendly” form of the URL by promoting in in Google.
Thanks Stevie. If I encourage Google to use and index the friendly URL format(without the .html) what effect would this have on SEO as at the moment Google has indexed the example.com/page.html URLS? Will Google automatically transfer the rankings over to the new URLs once they are re-indexed?
Also will doing this have any impact on Analytic’s figures? Maybe I am over thinking this, I just want to make sure that if I go ahead with it I know what the exact impact will be etc.
Thanks for your help so far mate.