Data Visualization with Flex, Part I

By | | Flex Tutorials

0

In a world increasingly driven by data, it’s easy to find yourself
overwhelmed by information. The more we have the less intelligible it
becomes, until eventually it’s all just numbers. Data visualization is the
art of rendering information usable: helping the viewer discern trends,
patterns, and pertinent facts at a glance, without needing a degree in
statistics.

In this series of tutorials, we’ll look at how you can use the Flex
platform to visualize your data and make it both clear and appealing to your
audience. The series assumes basic familiarity with Flex and the Flex
Builder environment.

Before we start, you should download
the sample project
as a Flex Builder archive.

We’ve got another Article
Quiz
for you this week—this time sponsored by Adobe—so once you’ve
read through the article be sure to test yourself!

Why Visualization?

An effective visualization can be as simple as a bar graph. Take,
for example, the following data set showing unique monthly visits to a web
site:

Table 1. Unique visits to a web site, broken down by month

MonthUnique Visits
Jan12,231
Feb11,989
Mar12,108
Apr13,400
May13,626
Jun15,455
Jul20,742
Aug22,345
Sep23,425
Oct20,123
Nov18,332
Dec14,255


This view is quite dull, and unless you read carefully through it
all, nothing really jumps out at you. We can, however, represent the same
series of data points in a graph.

Figure 1. The same data, presented as a line graph

The same data, presented as a line graph


We can see at a glance that there’s a peak in the traffic—it builds
significantly over a few months and then tails off. Immediately we witness
a benefit from data visualization: being able to easily spot a trend. Now
this might be meaningless by itself, but if we analyze it alongside some
other information—such as news coverage of a certain event, or what
features were released on our web site—we can link the two occurrences
together, establishing that whatever we were doing at that time was what
increased the traffic. When the web site ceased running that feature or
removed a functionality, the traffic dropped off—so we know that’s the
sort of content or feature that people are drawn to and we should do more
of it.

Even in this very simple example we can see how important
visualization can be. When you begin to look at more complex data sets,
however, the value increases significantly.

Extracting the Data

For this tutorial, we’ll use data from the SitePoint Forums to
analyze the usage frequency of certain technology keywords in users’
posts, and try to ascertain any relationships between them. The Forums
have been around for a while and are very popular, so we have an enormous
data set to work with. Simply reading through it would take forever,
providing us with little or no useful information, so our first task is to
generate a useful data set.

Before diving into the data, though, it’s a good idea to consider
what it is we want to visualize. In this case, I’ve decided to examine how
often different technologies are discussed in the forums, and how those
technologies are related. In order to do this, I’ll look at how many times
certain keywords are mentioned, and which other keywords tend to be found
in these same threads.

I’ve written a quick and dirty PHP script that extracts the data
from the database and writes it out to a file as a JSON object. The
structure of the JSON file produced will look like this:

{
  "keyword":"ajax",
  "count":12479,
  "links": {
    "flash":441,
    "php":1900,
    "javascript":2304,
    "flex":102,
    "adobe":61,
    "microsoft":123,
    "asp":473,
    ".net":180,
    "dotnet":1,
    "css":508,
    "xml":1184,
    "html":940,
    "mysql":369,
    "macromedia":18,
    "jsp":29,
    "ruby":160,
    "xhtml":223,
    "dhtml":114,
    "coldfusion":76,
    "air":16,
    "postgres":14,
    "actionscript":24,
    "stylesheet":39,
    "cgi":18,
    "silverlight":15,
    "groovy":2
  }
}

The count represents the number of times the
keyword (in this example, “flex”) is mentioned in the forums. The
link values are the number of threads containing both
that keyword and each other keyword. So, for example, in the above data
set, we can see that the word “ajax” was used 12,479 times, and that there
are 2,304 threads containing both of the words “ajax” and
“javascript.”

I’ll go over the PHP script briefly, as it’s beyond the focus of the
article. The important point to realize is we’re extracting some specific
data from our database to pass on to our Flex application. The kind of
script you’d use here would vary depending on what kind of data you wanted
to visualize.

The first step in our script is to define an array of keywords, and
another array to hold the frequencies and links we find in the
posts:

// populate an array of keywords to search for, with counts and link arrays
$keys =  array('ajax','coldfusion','flash','flex','air','adobe','macromedia','microsoft','php','ruby','groovy','asp','.net','dotnet','actionscript','javascript','dhtml','jsp','cgi','css','stylesheet','silverlight','mysql','postgres','xhtml','xml','html');
$keyFreqs = array();
foreach ($keys as $key) {
  $keyFreqs[$key] = array("count" => 0, "links" => array());
}

We then connect to the database, loop through each thread and each
post inside that thread, scanning for our keywords and adding to our
$keyFreqs array as we go. Finally, we flatten the array
to make it easier to access in Flex and save it to a file in JSON
format:

// Connect to the database
$link = mysqli_connect('localhost', 'root', 'root');
if (!mysqli_select_db($link, 'forum')) {
  print 'Unable to select database';
}
// loop through the threads and their posts
$threads = mysqli_query($link, 'SELECT threadid FROM thread ORDER BY dateline LIMIT 100000;');
while ($thread = mysqli_fetch_array($threads)) {
  $posts = mysqli_query($link, "SELECT postid, pagetext FROM post WHERE threadid=" . $row['threadid']);
  while ($post = mysqli_fetch_array($posts)) {
    $tempKeys = array();
    // loop over the list of keywords
    foreach ($keys as $key) {
      // search for the key in the post
      $pattern = "/ " . $key . "[^a-zA-Z0-9]*/i";
      $tempCount = preg_match_all($pattern, $post['pagetext'], $matches);
      //if we found one or more matches, add the key to a temporary list
      if ($tempCount > 0) {
        $tempKeys[] = $key;
      }
      // increment the global count for that keyword
      $keyFreqs[$key]["count"] += $tempCount;
    }
    // loop over the list of keywords we found a count for
    // and increment the link between them
    foreach ($tempKeys as $a) {
      foreach ($tempKeys as $b) {
        if ($a != $b) {
          if (isset($keyFreqs[$a]["links"][$b])) {
            $keyFreqs[$a]["links"][$b] += 1;
          } else {
            $keyFreqs[$a]["links"][$b] = 0;
          }
        }
      }
    }
  }
}
// Flatten the array and write it out to a file in JSON format
$newArray = array();
foreach($keyFreqs as $keyword => $value) {
  $newArray[] = array("keyword" => $keyword, "count" => $value['count'], "links" => $value["links"]);
}
$fh = fopen('json_data.txt','w');
fwrite($fh,json_encode($newArray));

Written By:

Toby Tremayne

A writer and software developer of more than 14 years experience, Toby is passionate about helping new and small businesses make the most of the internet and cloud technology. When he's not writing or telling stories he's busy trying to make technology easier to use for the average business person, and can often be found in dark corners practicing magic tricks or lock sport.

 

Comments on this entry are closed.