Julia for Rubyists: Crunch Those Numbers

Key Takeaways

Julia, while initially daunting due to its non-OOP approach, is a robust tool for numerical tasks, offering dynamic language ease with near-compiled speed, making it a strong candidate for Rubyists venturing into data-heavy projects.
The language’s matrix operations are not only a breeze but are also optimized for performance, resembling C-like speeds, which is a significant step up from Ruby’s capabilities in handling similar operations.
Julia’s plotting capabilities and integration with tools like the Juno IDE streamline the visualization process, allowing for immediate feedback and iterative data exploration, a stark contrast to Ruby’s less developed plotting ecosystem.
Julia’s adoption of features from R, such as the NA type in data arrays, enhances its data handling capabilities, making it suitable for complex data analysis tasks where data integrity is crucial.
The direct support for concurrency and parallelism in Julia addresses one of the major limitations in Ruby, presenting a compelling case for Rubyists looking for efficient execution in multi-threaded scenarios.

Screenshot 2015-06-06 06.27.48

I worked at a lab at MIT this summer focusing on some research related to compression of information as it travels through the network. A lot of my work was implemented in MATLAB. This wasn’t because of any particular preference on my part; it just happens that a lot of research (especially stuff that’s math heavy) is built on MATLAB.

There are a lot of things I like about MATLAB. Anything to do with matrices (a simple example: creating a matrix with a bunch of zeros is just zeros(n, n)) is really easy, the documentation is generally pretty good, and it’s quick to get started with the language. The feature set is awesome and, especially if you’re doing something in computer vision, seeing the results of standard algorithms quickly is incredibly useful.

There are also a lot of things I strongly dislike about MATLAB. In general, it feels as if MATLAB is continually trying to stop you from writing clean, readable code. Building abstractions is unnecessarily difficult and the concept of reusable libraries seems foreign to a lot of the MATLAB community. There’s no direct access to threading or any sane, generalizable concurrency framework. Also, I think it’s a pretty bad sign that there’s a website called Undocumented MATLAB that’s dedicated using “hidden” parts of MATLAB.

Julia is supposed to take the spot of MATLAB as a language quick to pick up and sketch out some algorithms, but it also feels like a solid language built by computer scientists. Of course, if you’re a Rubyist, you might not care about MATLAB to begin with, so what’s the point? Well, if you’re doing any sort of numerical work, Julia is definitely worth a look: it gives you the feel of a dynamic, interpreted language, with performance close to that of a compiled one. Creating quick visualizations of data is also a breeze.

Julia might seem a bit weird at first with its lack of OOP and all, but, with a little bit of effort, it can definitely expand your capabilities. This article won’t really go into a lot of depth into the syntax of Julia, as you can learn that elsewhere pretty quickly. Instead, we’ll focus on the stuff that makes Julia exciting and cool. We’ll whirl through matrices, datasets, plots, and cover a couple of statistical functions. We won’t take a look at each implementation in Ruby, but we’ll try to focus on the difference in the philosophies of the two languages.

Installation

Thankfully, Julia has a nice downloads page that should point you in the right direction. I use the Juno IDE which is based on Light Table and let’s me do stuff like this:

That’s some Julia code that’s being evaluated to show me the results right there, within the editor. When we look at some of the plotting features, having Juno handy will help you see the results of your efforts very quickly.

Matrices

Matrices are the bread and butter of numerical computing. All sorts of algorithms depend on operations that are typically defined on matrices. A pretty common example is image blurring: blurs are often applied by “convolving” (i.e. using some weird operation) a kernel (i.e. a type of matrix) with an image (another matrix). Basically, we want any numerical language to have very strong support for matrices.

There is some decent support for matrices in Ruby. We have the Matrix class which provides us with convenience methods like Matrix.zero(n) to create an n by n matrix of zeros. However, matrix operations in Ruby generally aren’t incredibly fast (especially in comparison to compiled languages). Julia has awesome utility methods for matrices and attains C-like performance. Let’s check out an example. Fire up the Julia REPL (thankfully, it has one of these) and type in the following:



x = zeros(10, 10)

The output should look like:



10x10 Array{Float64,2}:

 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0

 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0

 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0

 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0

 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0

 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0

 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0

 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0

 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0

 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0

Just like that, we have a 10 by 10 matrix of zeros. This is pretty similar to calling Matrix.zeros(10, 10), but there’s a very important difference: the output actually looks like a matrix! This might seem like a trivial difference. However, when working with a lot of matrices, it is invaluable to have the matrices formatted nicely when you are trying to see results. Let’s do something a tiny bit more interesting:



x = rand(10, 10)

That should give us:



10x10 Array{Float64,2}:

 0.614455     0.166746  0.933275   …  0.777238   0.662781   0.012962

 0.000197243  0.975239  0.0263813     0.784601   0.251306   0.0359492

 0.0881394    0.450103  0.895747      0.0219986  0.196202   0.259326

 0.256392     0.28074   0.542471      0.830691   0.9528     0.905797

 0.536424     0.661746  0.885126      0.261195   0.198792   0.03582

 0.28776      0.275747  0.94569    …  0.0970672  0.269422   0.246199

 0.953955     0.421148  0.0946357     0.677456   0.796799   0.828503

 0.492165     0.481043  0.857201      0.862093   0.0634439  0.97161

 0.276454     0.208118  0.313016      0.0972178  0.557233   0.00431404

 0.117841     0.891073  0.0320966     0.0487335  0.830744   0.426995

Julia does truncate the output, but it’s clear that we’re seeing a 10 by 10 matrix of pseudorandom numbers between 0 and 1. It’s possible to do this with a little Matrix.build magic in Ruby.

Scalar multiplication is pretty common, so it’s really easy in Julia:



x = rand(10, 10) * 10

How about multiplying matrices together:



x = rand(10, 10) * zeros(10, 10)

What if we want to index a certain element of the matrix? Let me hit you with a fact: Julia’s arrays/matrices are indexed by one. Calm down, start breathing. The world won’t end. Yes, your loop conditions will change, Yes, it is generally pretty annoying if you’re coming from a non-MATLAB and non-R background, but it’s pretty easy to get used to.

Also, if you’re writing tons of code operating on individual elements of the matrix, you’re probably overlooking some standard matrix operation that will let you accomplish what you’re doing in a much easier fashion (note: I’m not using this as an excuse for Julia’s 1-indexing, but it’s a good thing to keep in mind).

So far, so good. We haven’t really introduced that Ruby doesn’t make painless either. But, there is a general feel about Julia that makes it clear that it holds matrices in high regard: the pretty printing, the fact that matrix creation calls are top level functions, etc. However, there’s also plenty that makes Julia pretty unique.

Let’s take a look at one feature that Julia has borrowed from R: the NA type. Often times, when doing data analysis, we’ll have some values of data that are invalid for some reason. So, we have our initial dataset with some invalid values and then we perform some operations on the dataset to produce an output. But, after processing the data, we don’t want to consider the results that are based on invalid values. We can fix this problem with the NA type. Anywhere we have invalid data, replace it with an “NA”. Subsequently, anywhere we have a computation involving an NA, we will get an NA in return. In other words, computations on invalid data lead to invalid data. In order to see it in action, we first need to get the “DataArrays” package. Julia has a fantastic package management system backed right into the language.

Easily install the package:



Pkg.add("DataArrays")

Now, construct an array so that we can actually stuff an “NA” into it:



using DataArrays

x = @data(rand(10, 10))

x[1, 1] = NA

Don’t worry too much at the moment about what @data does: it more or less changes the matrix’s type so we can put “NA” values in it. Let’s try a computation:



y = x*2

That should give us:



  NA       0.553053   0.695716  0.284487   …  1.9758    0.761262   0.4869

 1.3595    0.0468469  1.31732   1.83256       1.70817   1.43662    0.930509

 0.306142  0.286241   0.982634  0.434252      1.94063   1.64462    0.731219

 1.88406   1.70816    1.08887   0.234274      1.45693   1.06927    1.60651

 0.503428  0.362866   0.335749  1.88895       0.341048  0.0441141  0.951636

 0.774465  0.789801   1.23474   0.0640433  …  1.92382   1.20227    1.0657

 1.38033   1.46768    1.78678   1.95522       1.53592   0.211695   0.631171

 1.09145   1.32949    1.59082   1.52581       1.50151   0.062626   1.02838

 0.386194  1.66468    1.37072   0.163497      0.522523  1.24837    0.880371

 1.16056   0.496622   0.994359  1.08291       0.866378  0.187132   1.51157

Great! The invalid data resulted in invalid data.

A little bit more interesting would be squaring the matrix:

x*x

That’ll give us:



10x10 DataArray{Float64,2}:

 NA   NA       NA       NA      …   NA       NA       NA       NA

 NA  3.14293  3.53757  2.4257      3.8196   4.36326  2.44844  3.22518

 NA  2.36765  2.77693  2.60346     3.25948  2.67558  1.35156  2.13138

 NA  2.46748  3.67635  3.47746     3.8359   4.51773  2.33752  2.99455

 NA  1.90375  2.17352  1.53465     2.25778  2.68     1.34128  1.9858

 NA  1.80034  2.47738  2.62089  …  2.9921   2.76951  1.47433  2.18699

 NA  2.9391   3.8815   3.72826     4.45572  5.03061  2.59674  3.34063

 NA  2.25157  3.12992  2.76169     3.43634  4.00735  2.01751  2.56007

 NA  1.62191  2.51771  2.71106     2.91424  2.76869  1.79907  2.02696

 NA  1.96264  2.53315  2.69743     3.19891  3.17627  1.44995  2.4413

Whoa, what just happened? Well, if you remember a bit of your linear algebra, recall the places where the “NA” value would be used in order to compute a position in the x*x matrix.

We’ve only scratched the surface of what Julia can do with matrices: there’s a heck of a lot more. Coming from Ruby, the amount of importance Julia places on matrices is pretty weird, at first. After spending a little bit of time working in the fields that Julia is usually applied (think lots of numbers and applied math), it becomes clear why.

Data

Julia’s borrowed another idea from R: ready-to-go datasets. R comes with a bunch of datasets that you can immediately begin using. Julia provides that to us with the RDatasets package. Let’s get a hold of it:



Pkg.add("RDatasets")

Start using it too:



using RDatasets

Here’s a pretty standard dataset called “iris”:



dataset("datasets", "iris")
150x5 DataFrame

| Row | SepalLength | SepalWidth | PetalLength | PetalWidth | Species     |

|-----|-------------|------------|-------------|------------|-------------|

| 1   | 5.1         | 3.5        | 1.4         | 0.2        | "setosa"    |

| 2   | 4.9         | 3.0        | 1.4         | 0.2        | "setosa"    |

| 3   | 4.7         | 3.2        | 1.3         | 0.2        | "setosa"    |

| 4   | 4.6         | 3.1        | 1.5         | 0.2        | "setosa"    |

| 5   | 5.0         | 3.6        | 1.4         | 0.2        | "setosa"    |

| 6   | 5.4         | 3.9        | 1.7         | 0.4        | "setosa"    |

| 7   | 4.6         | 3.4        | 1.4         | 0.3        | "setosa"    |

| 8   | 5.0         | 3.4        | 1.5         | 0.2        | "setosa"    |

| 9   | 4.4         | 2.9        | 1.4         | 0.2        | "setosa"    |

| 10  | 4.9         | 3.1        | 1.5         | 0.1        | "setosa"    |

| 11  | 5.4         | 3.7        | 1.5         | 0.2        | "setosa"    |

Although having this data might seem a bit useless (I mean, we could always read it from a CSV file in Ruby), having some sample data at your fingertips is incredibly useful when trying to sketch out some ideas in code. Notice that the data is stored in a Julia DataFrame, which is a way to putting data of various types into one “matrix” of sorts. With one package, we now have access to a wide range of datasets.

Plotting

One area in which Julia really excels is letting you get a feel for data really quickly. To do this, plotting some part of the data is usually a good idea. I haven’t found a lot of solid, usable Ruby libraries for plotting. For the longest time, Scruffy was the leader in the area, but it seems development hasn’t been happening. We’ll take a look at Gadfly, a pretty standard Julia graphing toolkit. It’s based on ideas from A Layered Grammar of Graphics which describes how to build a sensible graphic creation system.

The classic “hello world” of statistical plots is the “iris” plot. The “iris” dataset describes some characteristics of a few types of Iris flowers. We’ll take a look at how to make a plot of them.

First, we need a plotting library:



Pkg.add("Gadfly")

Let’s build our first plot:



using RDatasets

using Gadfly

plot(dataset("datasets", "iris"),x="SepalLength", y="SepalWidth", Geom.point)

Ok, what the heck is this plot call? It takes a DataFrame as it’s first argument (supplied from RDatasets) and optional parameters for the x and y variable names (which, if you look at the dataset output earlier, are the column names in the DataFrame). Finally, we pass in “Geom.point”, which tells Gadfly that we want to make a point plot. The results look very nice:

result

If you’re using Juno (the Julia IDE), you’ll be able to see your results very quickly. We have a bunch of species of flowers. How about coloring them differently in the graph? Basically, we want the color to be decided by one of the columns of the iris DataFrame:



plot(dataset("datasets", "iris"),x="SepalLength", y="SepalWidth", color="Species",Geom.point)

We’ve added the color="Species" to associate color with a column. Check it out:

result 2

Ok, enough with the point plots. What if we wanted to examine the distribution of the sepal lengths for each of the different species? Easy:



plot(dataset("datasets", "iris"), x = "Species", y = "SepalLength", Geom.boxplot)

The output looks pretty, too:

result 3

Statistics

Of course, if we’re plotting stuff, we’re probably interested in the statistics associated with the data. Fortunately, Julia provides a bunch of nice utility functions to squeeze the information out of the dataset. Coming from a Ruby background, it might seem a bit odd to have these functions in the global namespace. But, that’s a fundamental difference between Ruby and Julia. Ruby is meant as a general purpose language that’s focused on stuff like web development, which requires extensive compartmentalization. On the other hand, Julia is a general purpose language that’s geared toward scientific computing, where the tradition is to have the “most used” stuff front and center.

Let’s first call our dataset something reasonable:



iris = dataset("datasets", "iris")

Probably one of the most useful functions to get a grip on your data is describe. Give it a whirl on iris:



describe(iris)

The output (truncated) should look something like:



SepalLength

Min      4.3

1st Qu.  5.1

Median   5.8

Mean     5.843333333333334

3rd Qu.  6.4

Max      7.9

NAs      0

NA%      0.0%

That gives you the five number summary and some other information about each column of the dataset. We can access specific columns of the iris DataFrame pretty easily:



sepal_lengths = iris[:SepalLength]

We can get the specifics about the column pretty easily:



mean(sepal_length) #mean

std(sepal_length) #standard deviation

Wrapping it up

Wow, that was fast. So far, we’ve only really taken a glance at some of the stuff that makes Julia awesome to write code in, but even so, the differences with Ruby are clear. Julia is meant, from the ground up, to work with numbers, matrices, and the like. Ruby, on the other hand, wants to make sure that you can do that stuff if you want to, but that isn’t really a top priority. In the next article on Julia, we’ll take a more in depth look at features and also introduce the parallel computing constructs within the language.

Frequently Asked Questions (FAQs) about Julia and Ruby

What are the key differences between Julia and Ruby in terms of performance?

Julia is known for its high performance. It is designed for computational science and numerical analysis, making it highly efficient for tasks that require intensive calculations. Julia’s just-in-time (JIT) compilation converts Julia code into machine code at runtime, which significantly boosts its performance. On the other hand, Ruby is an interpreted language and is generally slower than compiled languages. However, Ruby’s performance is often considered sufficient for web development and scripting tasks.

How does the syntax of Julia and Ruby differ?

Julia’s syntax is quite similar to that of Python and MATLAB, making it easy to learn for those familiar with these languages. It is designed to be easy to read and write, with a focus on mathematical and scientific computing. Ruby, on the other hand, has a syntax that is influenced by Perl and Smalltalk. It is designed to be flexible and expressive, with a focus on simplicity and productivity.

What are the unique features of Julia that make it stand out from Ruby?

Julia has several features that make it stand out. It has a rich type system with types for real and complex numbers, arrays, and more. It also supports metaprogramming, allowing you to generate and manipulate Julia code within Julia itself. Additionally, Julia has a strong focus on numerical and scientific computing, with built-in support for matrix algebra and advanced mathematical functions.

Is Ruby more suitable for web development than Julia?

Yes, Ruby, particularly with the Ruby on Rails framework, is widely used for web development. It has a large and active community, a wealth of libraries and frameworks, and is known for its productivity and ease of use. Julia, while it can be used for web development, is not typically the first choice for this purpose. Its strengths lie more in numerical and scientific computing.

How does the community support for Julia and Ruby compare?

Ruby has a larger and more established community, with a wealth of resources, libraries, and tools available. It also has a large number of job opportunities, particularly in web development. Julia, while it has a smaller community, is growing rapidly, particularly in the fields of data science and numerical computing. It also has strong support from academia and research institutions.

How do Julia and Ruby handle concurrency and parallelism?

Julia has built-in support for concurrency and parallelism, allowing for efficient execution of tasks on multi-core processors and distributed computing systems. Ruby also supports concurrency and parallelism, but it is generally considered to be less efficient in this area due to its Global Interpreter Lock (GIL).

What are the key strengths of Ruby over Julia?

Ruby’s key strengths lie in its simplicity, productivity, and flexibility. It is easy to learn and use, with a clean and expressive syntax. It also has a large and active community, a wealth of libraries and frameworks, and strong support for web development.

Can Julia be used for web development like Ruby?

While Julia is not typically the first choice for web development, it is certainly capable of it. There are several packages available, such as Genie.jl, that provide functionality for building web applications in Julia.

How do Julia and Ruby compare in terms of learning curve?

Both Julia and Ruby are designed to be easy to learn and use. Ruby is often praised for its simplicity and readability, making it a good choice for beginners. Julia, while it may be a bit more complex due to its focus on scientific computing, has a syntax that is similar to Python and MATLAB, making it easier for those familiar with these languages.

What are some use cases where Julia would be a better choice than Ruby?

Julia shines in areas that require high performance and numerical computing. This includes fields like data science, machine learning, and scientific research. Its efficient handling of mathematical operations, support for parallel computing, and ability to call C and Fortran libraries directly make it a powerful tool for these tasks.