JavaScript
Article
By James Edwards

Dealing with unqualified HREF values

By James Edwards
Help us help you! You'll get a... FREE 6-Month Subscription to SitePoint Premium Plus you'll go in the draw to WIN a new Macbook SitePoint 2017 Survey Yes, let's Do this It only takes 5 min

When I was building my extension for finding unused CSS rules, I needed a way of qualifying any href value into a complete URI. I needed this because I wanted it to support stylesheets inside IE conditional comments, but of course to Firefox these are just comments — I had to parse each comment node with a regular expression to extract what’s inside it, and therefore, the href value I got back was always just a string, not a property or a qualified path.

And it’s not the first time I’ve needed this ability, but in the past it’s been with predictable circumstances where I already know the domain name and path. But here those circumstances were not predictable — I needed a solution that would work for any domain name, any path, and any kind of href format (remembering that an href value could be any one of several formats):

  • relative: "test.css"
  • relative with directories: "foo/test.css"
  • relative from here: "./test.css"
  • relative from higher up the directory structure: "../../foo/test.css"
  • relative to the http root: "/test.css"
  • absolute: "http://www.sitepoint.com/test.css"
  • absolute with port: "http://www.sitepoint.com:80/test.css"
  • absolute with different protocol: "https://www.sitepoint.com/test.css"
--ADVERTISEMENT--

When are HREFs qualified?

When we retrieve an href with JavaScript, the value that comes back has some cross-browser quirks. What mostly happens is that a value retrieved with the shorthand .href property will come back as a qualified URI, whereas a value retrieved with getAttribute('href') will (and should, according to specification) come back as the literal attribute value. So with this link:

<a id="testlink" href="/test.html">test page</a>

We should get these values:

document.getElementById('testlink').href == 'http://www.sitepoint.com/test.html';
document.getElementById('testlink').getAttribute('href') == '/test.html';

And in Opera, Firefox and Safari that is indeed what we get. However in Internet Explorer (all versions, up to and including IE7) that isn’t what happens — for both examples we get back a fully-qualified URI, not a raw attribute value:

document.getElementById('testlink').href == 'http://www.sitepoint.com/test.html';
document.getElementById('testlink').getAttribute('href') == 'http://www.sitepoint.com/test.html';

This behavioral quirk is documented in Kevin Yank and Cameron Adams’ recent book, Simply JavaScript; but it gets quirkier still. Although this behavior applies with the href of a regular link (an <a> element), if we do the same thing for a <link> stylesheet, we get exactly the opposite behavior in IE. This HTML:

<link rel="stylesheet" type="text/css" href="/test.css" />

Produces this result:

document.getElementById('teststylesheet').href == '/test.css';
document.getElementById('teststylesheet').getAttribute('href') == '/test.css';

In both cases we get the raw attribute value (whereas in other browsers we get the same results as for an anchor — .href is fully qualified while getAttribute produces a literal value).

Anyway…

Behavioral quirks aside, I have to say that IE‘s behavior with links is almost always what I want. Deriving a path or file name from a URI is fairly simple, but doing the opposite is rather more complex.

So I wrote a helper function to do it. It accepts an href in any format and returns a qualified URI based on the current document location (or if the value is already qualified, it’s returned unchanged):

//qualify an HREF to form a complete URI
function qualifyHREF(href)
{
	//get the current document location object
	var loc = document.location;

	//build a base URI from the protocol plus host (which includes port if applicable)
	var uri = loc.protocol + '//' + loc.host;

	//if the input path is relative-from-here
	//just delete the ./ token to make it relative
	if(/^(./)([^/]?)/.test(href))
	{
		href = href.replace(/^(./)([^/]?)/, '$2');
	}

	//if the input href is already qualified, copy it unchanged
	if(/^([a-z]+):///.test(href))
	{
		uri = href;
	}

	//or if the input href begins with a leading slash, then it's base relative
	//so just add the input href to the base URI
	else if(href.substr(0, 1) == '/')
	{
		uri += href;
	}

	//or if it's an up-reference we need to compute the path
	else if(/^((../)+)([^/].*$)/.test(href))
	{
		//get the last part of the path, minus up-references
		var lastpath = href.match(/^((../)+)([^/].*$)/);
		lastpath = lastpath[lastpath.length - 1];

		//count the number of up-references
		var references = href.split('../').length - 1;

		//get the path parts and delete the last one (this page or directory)
		var parts = loc.pathname.split('/');
		parts = parts.splice(0, parts.length - 1);

		//for each of the up-references, delete the last part of the path
		for(var i=0; i<references; i++)
		{
			parts = parts.splice(0, parts.length - 1);
		}

		//now rebuild the path
		var path = '';
		for(i=0; i<parts.length; i++)
		{
			if(parts[i] != '')
			{
				path += '/' + parts[i];
			}
		}
		path += '/';

		//and add the last part of the path
		path += lastpath;

		//then add the path and input href to the base URI
		uri += path;
	}

	//otherwise it's a relative path,
	else
	{
		//calculate the path to this directory
		path = '';
		parts = loc.pathname.split('/');
		parts = parts.splice(0, parts.length - 1);
		for(var i=0; i<parts.length; i++)
		{
			if(parts[i] != '')
			{
				path += '/' + parts[i];
			}
		}
		path += '/';

		//then add the path and input href to the base URI
		uri += path + href;
	}

	//return the final uri
	return uri;
}

One more for the toolkit!

Login or Create Account to Comment
Login Create Account
Recommended
Sponsors
Get the most important and interesting stories in tech. Straight to your inbox, daily.Is it good?