Common Trip-ups for New Rubyists, Part I

Meanness of the red little man

Once internalized, Ruby is a fairly straightforward language. Until that happens, however, many potential converts are turned off by some of its more unusual aspects. This series hopes to clear up some of the confusion that newcomers face when learning Ruby.

There will be an attempt to adhere to the RDoc notation for describing methods:

Array#length refers to an instance method length on class Array.
Array::new refers to a class method new on class Array.
Math::PI refers to a constant PI in module Math.

Ruby 1.9+ is assumed.

Instance Variables and Class Variables

In most programming languages, instance variables must be declared before they can be assigned. Ruby is the diametric opposite – instance variables cannot be declared at all. Instead, an instance variable in Ruby comes into existence the first time it is assigned.

Creating an instance variable is as easy as taking a local variable and slapping a “@” on the beginning.

class Item
  def initialize(title)
    @title = title
  end
end

item = Item.new('Chia Ruby')

Underneath the hood, an instance variable is just a variable stored in self – the current instance. It might be tempting to assume that instance variables can be assigned with self.foo = like in Python. In fact, self.foo = would send the foo= message to self, and Ruby would try to call the corresponding method. This only works if the method exists.

class Item
  def initialize(title)
    self.title = title
  end
end

item = Item.new('Chia Ruby') # => undefined method `title='...

In order to access instance variables on the object outside of instance method definitions, getter and setter methods need to be defined.

class Item
  def title=(t)
    @title = t
  end

  def title
    @title      
  end
end

item = Item.new
puts item.title.inspect # => nil
item.title = "Chia Ruby"
puts item.title # => "Chia Ruby"

The #initialize constructor has been left out here to show that it is optional. Also, notice that accessing an instance variable that has not been assigned does not raise an error. @title is nil when it is accessed before assignment. Part II will demonstrate how this can be used for lazy initialization.

Defining getters and setters over and over would get old pretty quickly. Fortunately, Ruby provides a trio of helper methods:

#attr_reader – define instance-level getters
#attr_writer – define instance-level setters
#attr_accessor – define both

Typically, these helpers go at the beginning of a class definition.

class Thing
  attr_accessor :foo, :bar
  attr_reader :baz

  def initialize
    @baz = "cat"
  end
end

thing = Thing.new
thing.foo = 1
thing.bar = 2
puts thing.baz # => "cat"

A consequence of the fact that instance variables are just variables defined in self is that instance variables also work at the class level. After all, classes are just instances of Class. These are commonly referred to as class instance variables.

class Parent
  @value = 1
  def self.value
    @value
  end
end

class Child < Parent
  @value = 2
end

puts Parent.value # => 1
puts Child.value  # => 2

In addition to instance variables, Ruby also has so-called “class variables” by using @@ instead of @. Unfortunately, these are frowned upon due to the fact that they replace their descendants’ / ancestors’ values. For this reason, it’s better to think of them as class-hierarchy variables.

class Parent
  @@value = 1
  def self.value
    @@value
  end
end

class Child < Parent
  @@value = 2
end

puts Parent.value #=> 2 (because Child overwrote @@value)

Actually, it’s a lot worse than that. Think about what happens when a class-hierarchy variable is assigned at the top level, where self is main – an instance of Object.

@@value = 3
puts Parent.value # => 3

Since basically every Ruby object descends from Object, it will have access to the same class-hierarchy variables as Object. So, class variables are potentially global variables. This makes them highly unpredictable and prone to misuse.

Modules

In Ruby, a module is a container for methods. A class, on the other hand, is a special kind of module that has the ability to create instances and have ancestors.

module MyModule
  def hello
    puts "hello from instance"
  end
end

class MyClass
  def hello
    puts "hello from instance"
  end
end

instance = MyClass.new
instance.hello # => "hello from instance"

Methods in a module can be instance methods or class methods – at least semantically. Ruby class methods are just instance methods on class objects. It’s a bit counter-intuitive, but modules are instances of class Module…even though classes are kinds of modules.

module MyModule
  # Instance method defined on MyModule which is an instance of Module
  def MyModule.hello
    puts 'hello from module'  
  end

  # Same as "def MyModule.hello" because self is MyModule here
  def self.hello
    puts 'hello from module'
  end

  # Instance method for an instance of a class mixing-in MyModule
  def hello
    puts 'hello from instance'    
  end
end

MyModule.hello # => "hello from module"

Just because modules don’t have instance factories doesn’t mean they can’t have data. Since they are instances of Module, they can have instance variables.

module Fooable
  def self.foo=(value)
    @foo = value
  end

  def self.foo
    @foo
  end
end

Fooable.foo = "baz"
puts Fooable.foo # => "baz"

A seeming contradiction at this point is that modules can have instance methods in them, even though modules can’t create instances. It turns out that a primary use of modules is in the form of mixins where the module’s methods get incorporated into the class. There the module’s instance methods will be used in the class’ instances.

Mixins

There are two ways to mix a module into a class:

#include – add a module’s methods as instance methods
#extend – add a module’s methods as class methods

Examples:

module Helloable
  def hello
    puts "Hello World"
  end
end

class IncludeClass
  include Helloable
end

class ExtendClass
  extend Helloable
end

IncludeClass.new.hello
ExtendClass.hello

Sometimes it’s desirable to mix both class and instance methods into classes. The obvious, if redundant, way to do this is to put the class methods in one module, and put the instance methods in another. Then, the instance module can be included, and then class module can be extended.

Instead of this, it’s better to use a bit of metaprogramming magic. The Module#included hook method detects when a module has been included in a class. When the module is included, the class can extend an inner module (often called ClassMethods) that contains the class-level methods.

module Helloable

  # Gets called on 'include Helloable'
  def self.included(klass)
    # 'base' often used instead of 'klass'
    klass.extend(ClassMethods)
  end

  # Method that will become an instance method
  def hello
    puts "hello from instance"
  end

  # Methods that will become class methods
  module ClassMethods
    def hello
      puts "hello from class"
    end
  end

end

class HelloClass
  include Helloable
end

HelloClass.hello # => "hello from class"
HelloClass.new.hello # => "hello from instance"

A quick way to demonstrate the utility of mixins is with the Comparable module. It defines comparison operator methods based on the return value of the combined comparison operator <=> method. Let’s create a class StringFraction that provides proper comparison between fractions in strings.

class StringFraction
  include Comparable

  def initialize(fraction)
    @fraction = fraction  
  end

  def rational
    @fraction.to_r
  end

  def <=>(other)
    self.rational <=> other.rational
  end
end

f1 = StringFraction.new("1/2")
f2 = StringFraction.new("1/3")

puts f1 > f2  # => true

Module Gotchas

1. If a module is included twice, the second inclusion is ignored.

module HelloModule
  def say
    "hello from module"
  end
end

module GoodbyeModule
  def say
    "goodbye from module"
  end
end

class MyClass
  include HelloModule
  include GoodbyeModule
  include HelloModule
end

MyClass.new.say # => "goodbye from module"

2. If two modules define the same method, the second one to be included is used.

module Module1
  def hello
    "hello from module 1"
  end
end

module Module2
  def hello
    "hello from module 2"
  end
end

class HelloClass
  include Module1
  include Module2
end

HelloClass.new.hello # => "hello from module 2"

3. Module methods cannot replace methods already defined in a class.

module HelloModule
  def hello
    'hello from module'
  end
end

class HelloClass
  def hello
    'hello from class'
  end

  include HelloModule
end

HelloClass.new.hello # => 'hello from class'

Symbols

In other languages (and possibly Ruby), you may have seen constants used as names like this:

NORTH = 0
SOUTH = 1
EAST = 2
WEST = 3

Outside of Ruby, constants used as names and not for their values are known as enumerators (not to be confused with Ruby’s Enumerator). Typically, there is a cleaner way to do this as in an enumerated type (here, in Java):

public enum Direction {
  NORTH, SOUTH, EAST, WEST 
}

Ruby has something even better than enumerated types: symbols. A symbol is any reference that begins with :, including words like :name, :@foo, or :+. This flexibility is important because symbols are used to represent things like method names, instance variables, and constants.

From a practical standpoint, symbols -like enumerators- are basically fast, immutable strings. However, unlike enumerators, symbols do not need to be created manually. If you need a symbol to exist, just use it as if it already does. Here is how you would create a hash that has some symbols as keys:

worker = {
  :name => "John Doe",
  :age => "35",
  :job => "Basically Programmer"
}

puts worker[:name] # => "John Doe"

A potential pitfall is creating symbols dynamically, especially based on user input. This isn’t a great idea because symbols are not garbage collected. Once created, they exist until the program exits. This becomes a security issue when users are indirectly responsible for creating symbols. A malicious user could consume a lot of memory that would never be garbage collected, potentially crashing the VM.

Blocks, Procs, and Lambdas

Ruby method calls can be followed by either {/} or do/end token pairs enclosing arbitrary code. This arbitrary code can receive pipe-enclosed (ex: |i, j|) arguments that are referenced in the code.

[1,2,3].each { |i| print i } # => '123'

The code between these enclosing tokens is known as a block. A block is a chunk of code attached to a method call. Here, for #each item in the array, the block is executed. In this case, it just prints the item to standard output.

What isn’t obvious to new rubyists is how each item in the array gets inside the block. A great way to understand this is to write our own #each.

class Colors
  def self.each
    yield 'Red'
    yield 'Green'
    yield 'Blue'
  end
end

Colors.each { |color| print color } # => 'RedGreenBlue'

The yield keyword executes the block with the arguments passed to it. In other words, The arguments in a block (ex: |i|) come from calls to yield inside the method the block is attached to. If you want to iterate through a collection, you just need to yield each item in it.

But what happens when the method is called without a block?

Colors.each #=> ...no block given (yield) (LocalJumpError)...

It turns out that yield will try to execute the block regardless of whether there really is one. If we want a flexible method that yields to a block only if one is provided, Ruby provides #block_given?.

class Colors
  @colors = ['Red', 'Green', 'Blue']
  def self.each
    if block_given?
      # send each color to the block if there is one
      # #each returns the array
      @colors.each { |color| yield color }
    else
      # otherwise just return the array
      @colors
    end
  end
end

Storing Blocks

For someone coming from a language like JavaScript blocks are a bit like anonymous functions defined inside function calls (with the caveat that there can only be one per call). JavaScriptists will also be used to the fact that JavaScript functions can be stored in variables and passed to other functions. These are known as first-class functions. Strictly speaking, Ruby does not have first-class functions, but its blocks can be stored in callable objects.

Ruby provides two containers for storing blocks: procs and lambdas. These containers can be created in a number of ways:

time_proc = proc { Time.now }
time_lambda = lambda { Time.now }

# The popular, Ruby 1.8 compatible way to create a proc
old_proc = Proc.new { |foo| foo.upcase }

# Shorthand to create a lambda
stabby_lambda = ->(a, b) do
  a + b
end

# Turning a block passed to a method into a Proc
def a_method(&block)
  block
end

All callable objects are executed with #call.

add = lambda { |a, b| a + b }
puts add.call(1,2) # => 3

Procs and lambdas are actually both Proc objects. However, there are least two nuances to keep in mind:

They treat the return keyword differently. An explicit return in a proc returns from the context in which the proc was defined. An explicit return in a lambda just returns from the lambda.
Lambdas check the number of arguments. Procs assign nil to missing arguments.

The first point deserves some extra attention because it can lead to opaque errors.

return_proc = proc { return }
return_lambda = lambda { return }

def first(p)
  second(p)
end

def second(p)
  p.call
end

first(return_lambda)
first(return_proc) # => LocalJumpError

The lambda executes without any problems. However, when we try to call the proc, we get a LocalJumpError. This is because the proc was defined at the top level. The easiest way to get around the proc/block return issue is to avoid explicit returns. Instead, take advantage of Ruby’s implicit returns.

Note: In Ruby 1.8, proc created procs that checked the number of arguments like lambda, while Proc.new created what are now called procs. In Ruby 1.9+, proc was fixed to behave identically to Proc.new. Keep this in mind when using code that was written for 1.8.

Local Variable Scope

Ruby has two local scope barriers – points where local variables and arguments cannot pass:

Module definitions
Method definitions

Since classes are modules, there are three keywords to look for: module, class, and def.

lvar = 'x'

def print_local
  puts lvar
end

print_local #=> NameError: undefined local variable or method...

So how do you work around scope barriers? It turns out that blocks inherit the scope that they are defined in. We can take advantage of this to pass local variables past these barriers by using alternatives that take blocks.

lvar = 'x'

MyClass = Class.new do
  define_method :print_local do
    puts lvar
  end
end

MyClass.new.print_local #=> 'x'

Since each nested block contains the scopes from the higher blocks, the deepest block contains the scope with the local variable. Using nested blocks to provide access to local variables in this way is often referred to as flattening the scope.

Summary

Instance variables are references prefixed with ‘@’ and belong to self – the current instance.
Instance variables can be created anywhere, including modules.
The value of unassigned instance variables is nil
Ruby also has class variables, but they are almost global variables and shouldn’t be used.
Modules are containers for methods.
Classes are modules that also happen to be instance factories and have ancestors.
Ruby does not have multiple inheritance, but multiple modules can be “mixed-in” to classes.
All methods are instance methods in Ruby. Class methods are instance methods on class objects.
A block is a chunk of code attached to a method call that inherits the scope that it is defined in.
The arguments in a block (ex: |i|) come from calls to yield inside the method.
Blocks can be stored in procs or lambdas.
Procs are like portable blocks. Lambdas are like portable methods.
Symbols are Ruby’s answer to enumerated types.
Symbols are not garbage collected, so they can be a security issue when generated dynamically.
Local variables can’t cross the scope barriers of module and method definitions.
Blocks can be used to carry local variables past scope barriers.

Next, in Part II we’ll explore more of the inner workings of the Ruby language.