How to Process Large Volumes of Data in JavaScript

In my previous posts, we examined JavaScript Execution and Browser Limits and a method which can solve “unresponsive script” alerts using Timer-Based Pseudo-Threading. Today, we’ll look at ways to handle large volumes of data within the browser.

A few years ago, developers would never have considered alternatives to complex server-side processing. That perception has changed and many Ajax applications send huge quantities of data between the client and the server. In addition, code may update the DOM which is a particularly time-consuming browser process. However, attempting to analyze that information in one go can make an application unresponsive and throw script alerts.

JavaScript timers can help prevent browser locking issues by splitting a long data analysis process into shorter chunks. Here’s the start of our JavaScript function:


function ProcessArray(data, handler, callback) {

The ProcessArray() function accepts three arguments:

  1. data: an array of items to process
  2. handler: a function which processes an individual data item
  3. callback: an optional function called when all processing is complete.

Next, we’ll define configuration variables:


  var maxtime = 100;		// chunk processing time
  var delay = 20;		// delay between processes
  var queue = data.concat();	// clone original array

maxtime specifies the maximum number of milliseconds permitted for each chunk of processing. delay is the time in milliseconds between processing chunks. Finally, queue is clone the original data array–that won’t be necessary in all cases but, since the array is passed by reference and we’re discarding each item, it’s the safest option.

We can now use a setTimeout to start processing:


  setTimeout(function() {

    var endtime = +new Date() + maxtime;

    do {
      handler(queue.shift());
    } while (queue.length > 0 && endtime > +new Date());

First, an endtime is calculated — this is a future time when processing must cease. The do…while loop processes queued items in turn and continues until every item has completed or endtime has been reached.

note: Why use do…while?

JavaScript supports both while loops and do…while loops. The difference is that do…while loops are guaranteed to perform at least one iteration. If we used a standard while loop, the developer could set a low or negative maxtime, and the array processing would never start or complete.

Finally, we determine whether further items need to be processed and, if necessary, call our processing function after a short delay:


    if (queue.length > 0) {
      setTimeout(arguments.callee, delay);
    }
    else {
      if (callback) callback();
    }

  }, delay);
}
// end of ProcessArray function

The callback function is executed once every item has been processed.

We can test ProcessArray() with a small test case:

// process an individual data item
function Process(dataitem) {
  console.log(dataitem);
}

// processing is complete
function Done() {
  console.log("Done");
}

// test data
var data = [];
for (var i = 0; i < 500; i++) data[i] = i;

// process all items
ProcessArray(data, Process, Done);

The code will work in every browser including IE6+. It’s a viable cross-browser solution, but HTML5 provides a far nicer solution! In my next post, we’ll discuss web workers …

Win an Annual Membership to Learnable,

SitePoint's Learning Platform

  • fn64

    yep, after reading the beginning of article i thought ‘what about web workers?’
    looking forward to your next post

    • http://www.optimalworks.net/ Craig Buckler

      Web workers are very cool, but you will encounter problems with a certain browser. I mention no names, but even the latest v9 beta doesn’t have support.

      The method described above should work anywhere.

  • lorenzo

    One thing you have not mentioned is how hard is to break a piece of code into chunks which can then be called via setTimeout(…).

    Just imagine you have to iterate an array of arrays.
    Just rewriting this code in chunks is really hard.

    Other real-code cases are even harder (if at all doable).

    you are not the first one who omits the difficulty of rewriting time consuming tasks by using setTimeout.