Create Data Visualizations in JavaScript using Dimple and D3

Originally published at: http://www.sitepoint.com/create-data-visualizations-javascript-dimple-d3/

The world wide web places an abundance of data at our fingertips. Due to the sheer volume of this data, presenting it in a way that stands out, or in a way that gets your message across, can often prove tricky. This is where data visualizations come in.

In this article, I will guide you through the creation of a data visualization, namely US vehicle recalls for the month of January 2015, using the dimple.js JavaScript library built on top of D3.js.

Setting the Goal

The NHTSA/ODI provide a recall file (which is accessible via their website) containing all NHTSA safety-related defect and compliance campaigns since 1967. Our goal is to extract the data for a given month (January 2015), and to create a bar chart from it, depicting the total number of vehicle recalls by maker.

This data visualization will not be explanatory (we will show raw data), and hardly exploratory (there’s not much narrative for the viewers to build from that data). I do however, intend to display additional information next to the chart when a user hovers over one of the bars.

This is what we’ll end up with:

You can see a (smaller) live demo at the end of the article or view the original on CodePen.

Working with Data

Keeping Only the Data we Need

All of the files mentioned in this section can be found on our GitHub repo.

The original file FLAT_RCL.txt (link) is a tab-separated values file which contains a lot of data—109,682 records to be exact. There is an accompanying file RCL.txt (link) which details the columns pertaining to this data.

As we are only interested in the data for January 2015—or rather the records for which the Record Creation Date is January 2015—the rest of the records can be removed. To do this, I am using the OpenOffice Calc spreadsheet program (although any other spreadsheet software will suffice). The resulting file, RCL_January_2015.csv (link) only counts 201 records.

We now need to reduce the columns to a subset of those available, namely:
Record Creation Date, Maker, Model, Model Year, Begin Date of Manufacturing, End Date of Manufacturing, Potential Number of Units Affected, Defect Summary, Consequence Summary, and Corrective Summary. We can then add the columns names to the first line of the resulting CSV file, RCL_January_2015_clean.csv (link).

This gives us the raw data we need for our visualization.

Create the Data Structure

Now we need to manually group the recalls by maker, combining those records that have the same defect. We need to ensure that the combined records are sorted by date, then by model and that they have a cumulative potential total number of units affected.

We are going to use a JSON data structure for this grouping.

To illustrate this, let’s process the first three entries of the RCL_January_2015_clean.csv file. These can be grouped into one line stating that the MCI’s J4500 from the years 2013, 2014 and 2015, which have the same years of manufacturing, present the same defect. The potential number of units affected already groups these three models together in the dataset.

Here is the JSON data structure we are going to use:

{
  "items": [
    {
      "item": {
        "date": "",
        "models": [
          "" 
        ],
        "units": "",
        "defect": "",
        "consequence": "",
        "corrective": ""
      }
    }
  ]
}

After iterating this process (and escaping the double quotes), we now have the CSV file RCL_January_2015_json.csv (link). For the sake of brevity, our working example will show only the first three makers of the original file (3 out of 46).

Continue reading this article on SitePoint

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.