Parallel JavaScript with ParallelJS

One of the coolest new possibilities arriving along with HTML5 was the Worker interface of the Web Workers API. Beforehand, we had to introduce some tricks to still present a responsive website to the user. The Worker interface allows us to create functions that feature long runtime and require high-computational effort. Furthermore, Worker instances may be used simultaneously giving us the possibility to spawn as many of these workers as we desire. In this article I’m going to discuss why multi-threading is important and how to implement it in JavaScript with ParallelJS.

Why Multi-Threading?

This is a valid question. Historically, the ability to spawn threads provided an elegant way to partition the work within a process. The operating system is responsible for scheduling the time given for each thread, such that threads with higher priority and more work are preferred to low-priority idle threads. Over the last few years, simultaneous multi-threading (SMT) has become essential to access the computing abilities of modern CPUs. The reason is simple: Moore’s law is still valid regarding the number of transistors per area. However, frequency scaling had to stop for a number of reasons. Therefore, the available transistors had to be used otherwise. It was decided that architectural improvements (SIMD, for example) and multi-cores represent the optimum choice. Scaling Moore's Law

In order to use SMT we need to write parallel code, that is code that runs in parallel for obtaining a single result. We usually need to consider special algorithms, as most sequential code is either very difficult to parallelize or very inefficient. The reason lies in Amdahl’s law, which states that the speedup S is given by Amdahl's Law

where N is the number of parallel workers (for example processors, cores, or threads) and P is the parallel fraction. In the future many core architectures which rely even more on parallel algorithms might be used. In the area of High-Performance Computing GPU systems and special architectures, for instance the Intel Xeon Phi, represent such platforms. Finally, we should distinguish between general concurrent applications or algorithms, and parallel execution. Parallelism is the simultaneous execution of (possibly related) computations. In contrast, concurrency is the composition of independently executing processes.

Multi-Threading in JavaScript

In JavaScript we already know how to write concurrent programs, that is by using callbacks. This knowledge can now be transferred to create parallel programs as well! By its own construction, JavaScript is executed in a single thread mediated by an event loop (usually following the reactor pattern). For example, this gives us some nice abstraction for handling asynchronous requests to (external) resources. It also guarantees that previously defined callbacks are always triggered within the same thread of execution. There are no cross-threading exceptions, race-conditions, or other problems associated with threads. However, this does not bring us closer to SMT in JavaScript. With the introduction of the Worker interface, an elegant solution to this problem has been found. From the point of view of our main application, the code in the web worker should be treated as a concurrently running task. The communication is also performed in that manner. We use the messages API, which is also available for communication from contained websites to a hosting page. For instance the following code responds to an incoming message by sending a message to the originator.

window.addEventListener('message', function (event) {
	event.source.postMessage('Howdy Cowboy!', event.origin);
}, false);

Theoretically, a web worker might also spawn another web worker. However, in practice most browsers forbid this. Therefore, the only way to communicate between web workers is over the main application. The communication via messages is performed concurrently, such that there is only asynchronous (non-blocking) communication. At first, this may be odd to program but brings several advantages. Most importantly, our code is supposed to be race-condition free! Let’s see a simple example of computing a sequence of prime numbers in the background using two parameters for denoting the start and end of the sequence. First we create a file called prime.js with the following content:

onmessage = function (event) {
	var arguments = JSON.parse(event.data);
	run(arguments.start, arguments.end);
};
function run (start, end) {
	var n = start;
		
	while (n < end) {
		var k = Math.sqrt(n);
		var found = false;
		
		for (var i = 2; !found && i <= k; ++i) {
			found = n % i === 0;
		}
			
		if (!found) {
			postMessage(n.toString());
		}
			
		n++;
	}
}

Now we only need the following code in our main application to start the background worker.

if (typeof Worker !== 'undefined') {
	var w = new Worker('prime.js');
	w.onmessage = function(event) {
		console.log(event);
	};
	var args = { start : 100, end : 10000 };
	w.postMessage(JSON.stringify(args));
}

Quite a lot of work. Especially annoying is the usage of another file. This yields a nice separation, but for smaller tasks seems to be completely redundant. Luckily, there is a way out. Consider the following code:

var fs = (function () { 
	/* code for the worker */ 
}).toString(); 
var blob = new Blob(
   [fs.substr(13, fs.length - 14)],
   { type: 'text/javascript' }
);
var url = window.URL.createObjectURL(blob);
var worker = new Worker(url);
// Now setup communication and rest as before

Of course, we may want to have a better solution than such magic numbers (13 and 14) and, depending on the browser, a fallback for the usage of Blob and createObjectURL has to be used. If you aren’t a JavaScript expert what fs.substr(13, fs.length - 14) does is to take extract the function body. We do this by turning the function declaration into a string (using the toString() call) and remove the signature of the function itself. Can’t a library help us here?

Meet ParallelJS

This is where ParallelJS comes into play. It provides a nice API for some convenience along with web workers. It includes many helpers and highly useful abstractions. We start by supplying some data to work with.

var p = new Parallel([1, 2, 3, 4, 5]);
console.log(p.data);

The data field yields the provided array. Nothing “parallel” has been invoked yet. However, the instance p contains a set of methods, for example spawn, which will create a new web worker. It returns a Promise, which makes working with the result a breeze.

p.spawn(function (data) {
	return data.map(function (number) {
		return number * number;
	});
}).then(function (data) {
	console.log(data);
});

The problem with the code above is that the computation won’t be really parallel. We only create a single background worker that processes the whole data array in one sweep. We will obtain the result only if the whole array has been processed. A better solution is to use the map function of the Parallel instance.

p.map(function (number) {
	return number * number;
}).then(function (data) {
	console.log(data);
});

In the previous example the core is quite simple, potentially too simple. In a real example lots of operations and functions would be involved. We can include introduced functions by using the require function.

function factorial (n) { 
	return n < 2 ? 1 : n * factorial(n - 1);
}
 
p.require(factorial)

p.map(function (n) { 
	return Math.pow(10, n) / factorial(n); 
}).reduce(function (data) { 
	return data[0] + data[1]; 
}).then(function (data) {
	console.log(data);
});

The reduce function helps to aggregate the fragmented results to a single result. It provides a handy abstraction for collecting subresults and performing some action once all subresults are known.

Conclusions

ParallelJS gives us an elegant way to circumvent many problems that may occur when using web workers. Additionally, we obtain a nice API that holds some useful abstractions and helpers. In the future further improvements could be integrated. Along with the ability to use SMT in JavaScript, we might also want to use vectorization capabilities. Here SIMD.js seems like a viable approach if supported. Also using the GPU for computation may be a valid option in some (hopefully not too distant) future. In Node.js wrappers for CUDA (a parallel computing architecture) exist, but executing raw JavaScript code is still not feasible. Until that point in time, ParallelJS is our best shot at unleashing the power of multi-core CPUs for tackling long-running computations. What about you? How do you unleash the power of modern hardware using JavaScript?

Frequently Asked Questions (FAQs) about Parallel JavaScript with ParallelJS

What is ParallelJS and how does it work?

ParallelJS is a JavaScript library that allows you to parallelize data processing by taking advantage of multi-core processors. It works by creating a new Parallel object and passing an array of data to it. This data can then be processed in parallel using the .map() method, which applies a specified function to each item in the array. The results are then returned in a new array.

How can I install ParallelJS?

ParallelJS can be installed using npm, the Node.js package manager. Simply run the command ‘npm install paralleljs’ in your terminal. Once installed, you can require it in your JavaScript file using ‘var Parallel = require(‘paralleljs’);’.

What are the benefits of using ParallelJS?

ParallelJS allows you to take full advantage of multi-core processors for data processing tasks. This can significantly speed up processing times for large data sets. It also provides a simple and intuitive API, making it easy to parallelize your code.

Can I use ParallelJS in the browser?

Yes, ParallelJS can be used in the browser. You can include it in your HTML file using a script tag and the URL to the ParallelJS file. Once included, you can use the Parallel object just like in Node.js.

How can I use the .map() method in ParallelJS?

The .map() method in ParallelJS is used to apply a function to each item in the data array. The function is passed as a string to the .map() method. The results are then returned in a new array. For example, ‘var p = new Parallel([1, 2, 3]); p.map(‘function(n) { return n * 2; }’);’ would return a new array with the values [2, 4, 6].

What is the .reduce() method in ParallelJS?

The .reduce() method in ParallelJS is used to reduce the data array to a single value using a specified function. The function is passed as a string to the .reduce() method. For example, ‘var p = new Parallel([1, 2, 3]); p.reduce(‘function(a, b) { return a + b; }’);’ would return the value 6.

Can I chain methods in ParallelJS?

Yes, methods in ParallelJS can be chained together. For example, you could use the .map() method to process the data, then use the .reduce() method to combine the results into a single value.

How can I handle errors in ParallelJS?

Errors in ParallelJS can be handled using the .catch() method. This method takes a function that will be called if an error occurs during processing. The error object will be passed to this function.

Can I use ParallelJS with other JavaScript libraries?

Yes, ParallelJS can be used with other JavaScript libraries. However, you will need to ensure that the library is included in the worker context using the .require() method.

Is ParallelJS suitable for all data processing tasks?

While ParallelJS can significantly speed up processing times for large data sets, it may not be the best choice for all tasks. For small data sets, the overhead of creating workers and transferring data may outweigh the benefits of parallelization. It’s best to test ParallelJS with your specific use case to see if it provides a performance benefit.