Data Visualization with Flex, Part I

Toby Tremayne
Toby Tremayne

In a world increasingly driven by data, it’s easy to find yourself overwhelmed by information. The more we have the less intelligible it becomes, until eventually it’s all just numbers. Data visualization is the art of rendering information usable: helping the viewer discern trends, patterns, and pertinent facts at a glance, without needing a degree in statistics.

In this series of tutorials, we’ll look at how you can use the Flex platform to visualize your data and make it both clear and appealing to your audience. The series assumes basic familiarity with Flex and the Flex Builder environment.

Before we start, you should download the sample project as a Flex Builder archive.

We’ve got another Article Quiz for you this week—this time sponsored by Adobe—so once you’ve read through the article be sure to test yourself!

Why Visualization?

An effective visualization can be as simple as a bar graph. Take, for example, the following data set showing unique monthly visits to a web site:

Table 1. Unique visits to a web site, broken down by month

Month Unique Visits
Jan 12,231
Feb 11,989
Mar 12,108
Apr 13,400
May 13,626
Jun 15,455
Jul 20,742
Aug 22,345
Sep 23,425
Oct 20,123
Nov 18,332
Dec 14,255

This view is quite dull, and unless you read carefully through it all, nothing really jumps out at you. We can, however, represent the same series of data points in a graph.

Figure 1. The same data, presented as a line graph

The same data, presented as a line graph

We can see at a glance that there’s a peak in the traffic—it builds significantly over a few months and then tails off. Immediately we witness a benefit from data visualization: being able to easily spot a trend. Now this might be meaningless by itself, but if we analyze it alongside some other information—such as news coverage of a certain event, or what features were released on our web site—we can link the two occurrences together, establishing that whatever we were doing at that time was what increased the traffic. When the web site ceased running that feature or removed a functionality, the traffic dropped off—so we know that’s the sort of content or feature that people are drawn to and we should do more of it.

Even in this very simple example we can see how important visualization can be. When you begin to look at more complex data sets, however, the value increases significantly.

Extracting the Data

For this tutorial, we’ll use data from the SitePoint Forums to analyze the usage frequency of certain technology keywords in users’ posts, and try to ascertain any relationships between them. The Forums have been around for a while and are very popular, so we have an enormous data set to work with. Simply reading through it would take forever, providing us with little or no useful information, so our first task is to generate a useful data set.

Before diving into the data, though, it’s a good idea to consider what it is we want to visualize. In this case, I’ve decided to examine how often different technologies are discussed in the forums, and how those technologies are related. In order to do this, I’ll look at how many times certain keywords are mentioned, and which other keywords tend to be found in these same threads.

I’ve written a quick and dirty PHP script that extracts the data from the database and writes it out to a file as a JSON object. The structure of the JSON file produced will look like this:

{  "keyword":"ajax",  "count":12479,  "links": {    "flash":441,    "php":1900,    "javascript":2304,    "flex":102,    "adobe":61,    "microsoft":123,    "asp":473,    ".net":180,    "dotnet":1,    "css":508,    "xml":1184,    "html":940,    "mysql":369,    "macromedia":18,    "jsp":29,    "ruby":160,    "xhtml":223,    "dhtml":114,    "coldfusion":76,    "air":16,    "postgres":14,    "actionscript":24,    "stylesheet":39,    "cgi":18,    "silverlight":15,    "groovy":2  }}

The count represents the number of times the keyword (in this example, “flex”) is mentioned in the forums. The link values are the number of threads containing both that keyword and each other keyword. So, for example, in the above data set, we can see that the word “ajax” was used 12,479 times, and that there are 2,304 threads containing both of the words “ajax” and “javascript.”

I’ll go over the PHP script briefly, as it’s beyond the focus of the article. The important point to realize is we’re extracting some specific data from our database to pass on to our Flex application. The kind of script you’d use here would vary depending on what kind of data you wanted to visualize.

The first step in our script is to define an array of keywords, and another array to hold the frequencies and links we find in the posts:

// populate an array of keywords to search for, with counts and link arrays$keys =  array('ajax','coldfusion','flash','flex','air','adobe','macromedia','microsoft','php','ruby','groovy','asp','.net','dotnet','actionscript','javascript','dhtml','jsp','cgi','css','stylesheet','silverlight','mysql','postgres','xhtml','xml','html');$keyFreqs = array();foreach ($keys as $key) {  $keyFreqs[$key] = array("count" => 0, "links" => array());}

We then connect to the database, loop through each thread and each post inside that thread, scanning for our keywords and adding to our $keyFreqs array as we go. Finally, we flatten the array to make it easier to access in Flex and save it to a file in JSON format:

// Connect to the database$link = mysqli_connect('localhost', 'root', 'root');if (!mysqli_select_db($link, 'forum')) {  print 'Unable to select database';}// loop through the threads and their posts$threads = mysqli_query($link, 'SELECT threadid FROM thread ORDER BY dateline LIMIT 100000;');while ($thread = mysqli_fetch_array($threads)) {  $posts = mysqli_query($link, "SELECT postid, pagetext FROM post WHERE threadid=" . $row['threadid']);  while ($post = mysqli_fetch_array($posts)) {    $tempKeys = array();    // loop over the list of keywords    foreach ($keys as $key) {      // search for the key in the post      $pattern = "/ " . $key . "[^a-zA-Z0-9]*/i";      $tempCount = preg_match_all($pattern, $post['pagetext'], $matches);            //if we found one or more matches, add the key to a temporary list      if ($tempCount > 0) {        $tempKeys[] = $key;      }      // increment the global count for that keyword      $keyFreqs[$key]["count"] += $tempCount;    }    // loop over the list of keywords we found a count for     // and increment the link between them    foreach ($tempKeys as $a) {      foreach ($tempKeys as $b) {        if ($a != $b) {          if (isset($keyFreqs[$a]["links"][$b])) {            $keyFreqs[$a]["links"][$b] += 1;          } else {            $keyFreqs[$a]["links"][$b] = 0;          }        }      }    }  }}// Flatten the array and write it out to a file in JSON format$newArray = array();foreach($keyFreqs as $keyword => $value) {  $newArray[] = array("keyword" => $keyword, "count" => $value['count'], "links" => $value["links"]);}$fh = fopen('json_data.txt','w');fwrite($fh,json_encode($newArray));