JavaScript
Article

Easy URL Parsing With Isomorphic JavaScript

By Craig Buckler

Most web applications require URL parsing whether it’s to extract the domain name, implement a REST API or find an image path. A typical URL structure is described by the image below:

URL structure

You can break a URL string into constituent parts using regular expressions but it’s complicated and unnecessary…

Server-side URL Parsing

Node.js (and forks such as io.js) provide a URL API:

// Server-side JavaScript
var urlapi = require('url'),
    url = urlapi.parse('http://site.com:81/path/page?a=1&b=2#hash');

console.log(
	url.href + '\n' +			// the full URL
	url.protocol + '\n' +		// http:
	url.hostname + '\n' +		// site.com
	url.port + '\n' +			// 81
	url.pathname + '\n' +		// /path/page
	url.search + '\n' +			// ?a=1&b=2
	url.hash					// #hash
);

As you can see in the snippet above, the parse() method returns an object containing the data you need such as the protocol, the hostname, the port, and so on.

Client-side URL Parsing

There’s no equivalent API in the browser. But if there’s one thing browsers do well, it’s URL parsing and all links in the DOM implement a similar Location interface, e.g.:

// Client-side JavaScript
// find the first link in the DOM
var url = document.getElementsByTagName('a')[0];

console.log(
	url.href + '\n' +			// the full URL
	url.protocol + '\n' +		// http:
	url.hostname + '\n' +		// site.com
	url.port + '\n' +			// 81
	url.pathname + '\n' +		// /path/page
	url.search + '\n' +			// ?a=1&b=2
	url.hash					// #hash
);

If we have a URL string, we can use it on an in-memory anchor element (a) so it can be parsed without regular expressions, e.g.:

// Client-side JavaScript
// create dummy link
var url = document.createElement('a');
url.href = 'http://site.com:81/path/page?a=1&b=2#hash';

console.log(url.hostname); // site.com

Isomorphic URL Parsing

Aurelio recently discussed isomorphic JavaScript applications. In essence, it’s progressive enhancement taken to an extreme level where an application will happily run on either the client or server. A user with a modern browser would use a single-page application. Older browsers and search engine bots would see a server-rendered alternative. In theory, an application could implement varying levels of client/server processing depending on the speed and bandwidth capabilities of the device.

Isomorphic JavaScript has been discussed for many years but it’s complex. Few projects go further than
implementing sharable views and there aren’t many situations where standard progressive enhancement wouldn’t work just as well (if not better given most “isomorphic” frameworks appear to fail without client-side JavaScript). That said, it’s possible to create environment-agnostic micro libraries which offer a tentative first step into isomorphic concepts.

Let’s consider how we could write a URL parsing library in a lib.js file. First we’ll detect where the code is running:

// running on Node.js?
var isNode = (typeof module === 'object' && module.exports);

This isn’t particularly robust since you could have a module.exports function defined client-side but I don’t know of a better way (suggestions welcome). A similar approach used by other developers is to test for the presence of the window object:

// running on Node.js?
var isNode = typeof window === 'undefined';

Let’s now complete our lib.js code with a URLparse function:

// lib.js library functions

// running on Node.js?
var isNode = (typeof module === 'object' && module.exports);

(function(lib) {

	"use strict";

	// require Node URL API
	var url = (isNode ? require('url') : null);

	// parse URL
	lib.URLparse = function(str) {

		if (isNode) {
			return url.parse(str);
		}
		else {
			url = document.createElement('a');
			url.href = str;
			return url;
		}

	}

})(isNode ? module.exports : this.lib = {});

In this code I’ve used an isNode variable for clarity. However, you can avoid it by placing the test directly inside the last parenthesis of the snippet.

Server-side, URLparse is exported as a Common.JS module. To use it:

// include lib.js module
var lib = require('./lib.js');

var url = lib.URLparse('http://site.com:81/path/page?a=1&b=2#hash');
console.log(
	url.href + '\n' +			// the full URL
	url.protocol + '\n' +		// http:
	url.hostname + '\n' +		// site.com
	url.port + '\n' +			// 81
	url.pathname + '\n' +		// /path/page
	url.search + '\n' +			// ?a=1&b=2
	url.hash					// #hash
);

Client-side, URLparse is added as a method to the global lib object:

<script src="./lib.js"></script>
<script>
var url = lib.URLparse('http://site.com:81/path/page?a=1&b=2#hash');
console.log(
	url.href + '\n' +			// the full URL
	url.protocol + '\n' +		// http:
	url.hostname + '\n' +		// site.com
	url.port + '\n' +			// 81
	url.pathname + '\n' +		// /path/page
	url.search + '\n' +			// ?a=1&b=2
	url.hash					// #hash
);
</script>

Other than the library inclusion method, the client and server API is identical.

Admittedly, this is a simple example and URLparse runs (mostly) separate code on the client and server. But we have implemented a consistent API and it illustrates how JavaScript code can be written to run anywhere. We could extend the library to offer further client/server utility functions such as field validation, cookie parsing, date handling, currency formatting etc.

I’m not convinced full isomorphic applications are practical or possible given the differing types of logic required on the client and server. However, environment-agnostic libraries could ease the pain of having to write two sets of code to do the same thing.

Comments
freakyrag

From the article:

parse() returns an object, not an array. wink

ceeb

Well spotted. We'll get it changed...

capaj

isNode variable should be renamed to isCommonJs, because that check doesn't tell you if you're running in node. Consider when someone uses this script with JSPM or with browserify.

ceeb

That's a good point but I took the route which was most likely to succeed. Unfortunately, there's no guaranteed way to distinguish Node from client-side JavaScript. Or not that I'm aware of. In some ways, that's good. In others, it's painful!

fidel_karsto1

Hi Craig!
What am I missing? Why is a check for the existence of document or window not a guaranteed way to distinguish NodeJS from browser?

ceeb

Because a Node.js program could easily set similar variables, e.g.

var window = {}, document = {};

There are DOM parsing libraries which do this sort of thing to enable server-side processing.

Similarly, a client-side JS program can define...

var module = { exports: function() { ... } };

Part of the beauty - and frustration - of JavaScript is anything can appear to be native. Detecting whether code is running on Node or in a browser isn't guaranteed to work. But perhaps that's a good thing?

fidel_karsto1

Hi Craig!

OK, but that would mean one has to write something like:

global.document = {};
global.window = {};

...in nodejs right before the require call to fool a check in a module and override / change the global object.
That's definitely bad practice, but I agree that it is unsafe because on can work around it as described.
Thanks for the hint!

ceeb

Globals would do it, but so would any document or window variable defined in a module before isNode is set. While that would seem bad practice, what if you loaded a module named "window"...

var window = require('window');
fidel_karsto1

Sure, you can shoot yourself into the food in various ways. wink
What I thought of was more like the following:

A utility module env.js

module.exports = {
    isBrowser: function() {
        return typeof window !== 'undefined' && typeof document !== 'undefined';
    }
};
console.log('document: ', typeof document);
console.log('window: ', typeof window);

And some app.js

// globals.document = {}; globals.window = {};
// or
// require('/some/malicious/module/which/alters/globals');
var env = require('env.js');
console.log('isBrowser: ', env.isBrowser());

As long as you don't add some globals manipulating code before requiring env.js, it should be "safe".
Assigning the return value to a variable called window only affects the module scope.

Another interesting approach is this http://www.timetler.com/2012/10/13/environment-detection-in-javascript/
I did not expect such a "trivial" problem to be unsolved yet. I bet node.js and io.js will come up with a safe solution, when isomorphic code gets more relevance.

ceeb

How would you load env.js client-side? Unless you were using browserify or similar, you couldn't depend on env.isBrowser()? But if Browserify is a dependency, it's simpler to check for the existance of that ... in which case, you don't need env.js and your application disappears in a puff of logic!

Yeah, it's surprising that Node's been around for 5 or 6 years yet this remains a problem.

Recommended

Learn Coding Online
Learn Web Development

Start learning web development and design for free with SitePoint Premium!

Get the latest in JavaScript, once a week, for free.