Dealing with unqualified HREF values
When I was building my extension for finding unused CSS rules, I needed a way of qualifying any href
value into a complete URI. I needed this because I wanted it to support stylesheets inside IE conditional comments, but of course to Firefox these are just comments — I had to parse each comment node with a regular expression to extract what’s inside it, and therefore, the href
value I got back was always just a string, not a property or a qualified path.
And it’s not the first time I’ve needed this ability, but in the past it’s been with predictable circumstances where I already know the domain name and path. But here those circumstances were not predictable — I needed a solution that would work for any domain name, any path, and any kind of href
format (remembering that an href
value could be any one of several formats):
- relative:
"test.css"
- relative with directories:
"foo/test.css"
- relative from here:
"./test.css"
- relative from higher up the directory structure:
"../../foo/test.css"
- relative to the http root:
"/test.css"
- absolute:
"https://www.sitepoint.com/test.css"
- absolute with port:
"http://www.sitepoint.com:80/test.css"
- absolute with different protocol:
"https://www.sitepoint.com/test.css"
When are HREFs qualified?
When we retrieve an href
with JavaScript, the value that comes back has some cross-browser quirks. What mostly happens is that a value retrieved with the shorthand .href
property will come back as a qualified URI, whereas a value retrieved with getAttribute('href')
will (and should, according to specification) come back as the literal attribute value. So with this link:
<a id="testlink" href="/test.html">test page</a>
We should get these values:
document.getElementById('testlink').href == 'https://www.sitepoint.com/test.html';
document.getElementById('testlink').getAttribute('href') == '/test.html';
And in Opera, Firefox and Safari that is indeed what we get. However in Internet Explorer (all versions, up to and including IE7) that isn’t what happens — for both examples we get back a fully-qualified URI, not a raw attribute value:
document.getElementById('testlink').href == 'https://www.sitepoint.com/test.html';
document.getElementById('testlink').getAttribute('href') == 'https://www.sitepoint.com/test.html';
This behavioral quirk is documented in Kevin Yank and Cameron Adams’ recent book, Simply JavaScript; but it gets quirkier still. Although this behavior applies with the href
of a regular link (an <a>
element), if we do the same thing for a <link>
stylesheet, we get exactly the opposite behavior in IE. This HTML:
<link rel="stylesheet" type="text/css" href="/test.css" />
Produces this result:
document.getElementById('teststylesheet').href == '/test.css';
document.getElementById('teststylesheet').getAttribute('href') == '/test.css';
In both cases we get the raw attribute value (whereas in other browsers we get the same results as for an anchor — .href
is fully qualified while getAttribute
produces a literal value).
Anyway…
Behavioral quirks aside, I have to say that IE‘s behavior with links is almost always what I want. Deriving a path or file name from a URI is fairly simple, but doing the opposite is rather more complex.
So I wrote a helper function to do it. It accepts an href
in any format and returns a qualified URI based on the current document location (or if the value is already qualified, it’s returned unchanged):
//qualify an HREF to form a complete URI
function qualifyHREF(href)
{
//get the current document location object
var loc = document.location;
//build a base URI from the protocol plus host (which includes port if applicable)
var uri = loc.protocol + '//' + loc.host;
//if the input path is relative-from-here
//just delete the ./ token to make it relative
if(/^(./)([^/]?)/.test(href))
{
href = href.replace(/^(./)([^/]?)/, '$2');
}
//if the input href is already qualified, copy it unchanged
if(/^([a-z]+):///.test(href))
{
uri = href;
}
//or if the input href begins with a leading slash, then it's base relative
//so just add the input href to the base URI
else if(href.substr(0, 1) == '/')
{
uri += href;
}
//or if it's an up-reference we need to compute the path
else if(/^((../)+)([^/].*$)/.test(href))
{
//get the last part of the path, minus up-references
var lastpath = href.match(/^((../)+)([^/].*$)/);
lastpath = lastpath[lastpath.length - 1];
//count the number of up-references
var references = href.split('../').length - 1;
//get the path parts and delete the last one (this page or directory)
var parts = loc.pathname.split('/');
parts = parts.splice(0, parts.length - 1);
//for each of the up-references, delete the last part of the path
for(var i=0; i<references; i++)
{
parts = parts.splice(0, parts.length - 1);
}
//now rebuild the path
var path = '';
for(i=0; i<parts.length; i++)
{
if(parts[i] != '')
{
path += '/' + parts[i];
}
}
path += '/';
//and add the last part of the path
path += lastpath;
//then add the path and input href to the base URI
uri += path;
}
//otherwise it's a relative path,
else
{
//calculate the path to this directory
path = '';
parts = loc.pathname.split('/');
parts = parts.splice(0, parts.length - 1);
for(var i=0; i<parts.length; i++)
{
if(parts[i] != '')
{
path += '/' + parts[i];
}
}
path += '/';
//then add the path and input href to the base URI
uri += path + href;
}
//return the final uri
return uri;
}
One more for the toolkit!