Common Trip-ups for New Rubyists, Part I
Once internalized, Ruby is a fairly straightforward language. Until that happens, however, many potential converts are turned off by some of its more unusual aspects. This series hopes to clear up some of the confusion that newcomers face when learning Ruby.
There will be an attempt to adhere to the RDoc notation for describing methods:
Array#length
refers to an instance methodlength
on classArray
.Array::new
refers to a class methodnew
on classArray
.Math::PI
refers to a constantPI
in moduleMath
.
Ruby 1.9+ is assumed.
Instance Variables and Class Variables
In most programming languages, instance variables must be declared before they can be assigned. Ruby is the diametric opposite – instance variables cannot be declared at all. Instead, an instance variable in Ruby comes into existence the first time it is assigned.
Creating an instance variable is as easy as taking a local variable and slapping a “@” on the beginning.
class Item
def initialize(title)
@title = title
end
end
item = Item.new('Chia Ruby')
Underneath the hood, an instance variable is just a variable stored in self
– the current instance. It might be tempting to assume that instance variables can be assigned with self.foo =
like in Python. In fact, self.foo =
would send the foo=
message to self
, and Ruby would try to call the corresponding method. This only works if the method exists.
class Item
def initialize(title)
self.title = title
end
end
item = Item.new('Chia Ruby') # => undefined method `title='...
In order to access instance variables on the object outside of instance method definitions, getter and setter methods need to be defined.
class Item
def title=(t)
@title = t
end
def title
@title
end
end
item = Item.new
puts item.title.inspect # => nil
item.title = "Chia Ruby"
puts item.title # => "Chia Ruby"
The #initialize
constructor has been left out here to show that it is optional. Also, notice that accessing an instance variable that has not been assigned does not raise an error. @title
is nil
when it is accessed before assignment. Part II will demonstrate how this can be used for lazy initialization.
Defining getters and setters over and over would get old pretty quickly. Fortunately, Ruby provides a trio of helper methods:
#attr_reader
– define instance-level getters#attr_writer
– define instance-level setters#attr_accessor
– define both
Typically, these helpers go at the beginning of a class definition.
class Thing
attr_accessor :foo, :bar
attr_reader :baz
def initialize
@baz = "cat"
end
end
thing = Thing.new
thing.foo = 1
thing.bar = 2
puts thing.baz # => "cat"
A consequence of the fact that instance variables are just variables defined in self
is that instance variables also work at the class level. After all, classes are just instances of Class
. These are commonly referred to as class instance variables.
class Parent
@value = 1
def self.value
@value
end
end
class Child < Parent
@value = 2
end
puts Parent.value # => 1
puts Child.value # => 2
In addition to instance variables, Ruby also has so-called “class variables” by using @@
instead of @
. Unfortunately, these are frowned upon due to the fact that they replace their descendants’ / ancestors’ values. For this reason, it’s better to think of them as class-hierarchy variables.
class Parent
@@value = 1
def self.value
@@value
end
end
class Child < Parent
@@value = 2
end
puts Parent.value #=> 2 (because Child overwrote @@value)
Actually, it’s a lot worse than that. Think about what happens when a class-hierarchy variable is assigned at the top level, where self
is main
– an instance of Object
.
@@value = 3
puts Parent.value # => 3
Since basically every Ruby object descends from Object
, it will have access to the same class-hierarchy variables as Object
. So, class variables are potentially global variables. This makes them highly unpredictable and prone to misuse.
Modules
In Ruby, a module is a container for methods. A class, on the other hand, is a special kind of module that has the ability to create instances and have ancestors.
module MyModule
def hello
puts "hello from instance"
end
end
class MyClass
def hello
puts "hello from instance"
end
end
instance = MyClass.new
instance.hello # => "hello from instance"
Methods in a module can be instance methods or class methods – at least semantically. Ruby class methods are just instance methods on class objects. It’s a bit counter-intuitive, but modules are instances of class Module
…even though classes are kinds of modules.
module MyModule
# Instance method defined on MyModule which is an instance of Module
def MyModule.hello
puts 'hello from module'
end
# Same as "def MyModule.hello" because self is MyModule here
def self.hello
puts 'hello from module'
end
# Instance method for an instance of a class mixing-in MyModule
def hello
puts 'hello from instance'
end
end
MyModule.hello # => "hello from module"
Just because modules don’t have instance factories doesn’t mean they can’t have data. Since they are instances of Module
, they can have instance variables.
module Fooable
def self.foo=(value)
@foo = value
end
def self.foo
@foo
end
end
Fooable.foo = "baz"
puts Fooable.foo # => "baz"
A seeming contradiction at this point is that modules can have instance methods in them, even though modules can’t create instances. It turns out that a primary use of modules is in the form of mixins where the module’s methods get incorporated into the class. There the module’s instance methods will be used in the class’ instances.
Mixins
There are two ways to mix a module into a class:
- #include – add a module’s methods as instance methods
- #extend – add a module’s methods as class methods
Examples:
module Helloable
def hello
puts "Hello World"
end
end
class IncludeClass
include Helloable
end
class ExtendClass
extend Helloable
end
IncludeClass.new.hello
ExtendClass.hello
Sometimes it’s desirable to mix both class and instance methods into classes. The obvious, if redundant, way to do this is to put the class methods in one module, and put the instance methods in another. Then, the instance module can be included, and then class module can be extended.
Instead of this, it’s better to use a bit of metaprogramming magic. The Module#included
hook method detects when a module has been included in a class. When the module is included, the class can extend an inner module (often called ClassMethods
) that contains the class-level methods.
module Helloable
# Gets called on 'include Helloable'
def self.included(klass)
# 'base' often used instead of 'klass'
klass.extend(ClassMethods)
end
# Method that will become an instance method
def hello
puts "hello from instance"
end
# Methods that will become class methods
module ClassMethods
def hello
puts "hello from class"
end
end
end
class HelloClass
include Helloable
end
HelloClass.hello # => "hello from class"
HelloClass.new.hello # => "hello from instance"
A quick way to demonstrate the utility of mixins is with the Comparable
module. It defines comparison operator methods based on the return value of the combined comparison operator <=>
method. Let’s create a class StringFraction
that provides proper comparison between fractions in strings.
class StringFraction
include Comparable
def initialize(fraction)
@fraction = fraction
end
def rational
@fraction.to_r
end
def <=>(other)
self.rational <=> other.rational
end
end
f1 = StringFraction.new("1/2")
f2 = StringFraction.new("1/3")
puts f1 > f2 # => true
Module Gotchas
1. If a module is included twice, the second inclusion is ignored.
module HelloModule
def say
"hello from module"
end
end
module GoodbyeModule
def say
"goodbye from module"
end
end
class MyClass
include HelloModule
include GoodbyeModule
include HelloModule
end
MyClass.new.say # => "goodbye from module"
2. If two modules define the same method, the second one to be included is used.
module Module1
def hello
"hello from module 1"
end
end
module Module2
def hello
"hello from module 2"
end
end
class HelloClass
include Module1
include Module2
end
HelloClass.new.hello # => "hello from module 2"
3. Module methods cannot replace methods already defined in a class.
module HelloModule
def hello
'hello from module'
end
end
class HelloClass
def hello
'hello from class'
end
include HelloModule
end
HelloClass.new.hello # => 'hello from class'
Symbols
In other languages (and possibly Ruby), you may have seen constants used as names like this:
NORTH = 0
SOUTH = 1
EAST = 2
WEST = 3
Outside of Ruby, constants used as names and not for their values are known as enumerators (not to be confused with Ruby’s Enumerator
). Typically, there is a cleaner way to do this as in an enumerated type (here, in Java):
public enum Direction {
NORTH, SOUTH, EAST, WEST
}
Ruby has something even better than enumerated types: symbols. A symbol is any reference that begins with :
, including words like :name
, :@foo
, or :+
. This flexibility is important because symbols are used to represent things like method names, instance variables, and constants.
From a practical standpoint, symbols -like enumerators- are basically fast, immutable strings. However, unlike enumerators, symbols do not need to be created manually. If you need a symbol to exist, just use it as if it already does. Here is how you would create a hash that has some symbols as keys:
worker = {
:name => "John Doe",
:age => "35",
:job => "Basically Programmer"
}
puts worker[:name] # => "John Doe"
A potential pitfall is creating symbols dynamically, especially based on user input. This isn’t a great idea because symbols are not garbage collected. Once created, they exist until the program exits. This becomes a security issue when users are indirectly responsible for creating symbols. A malicious user could consume a lot of memory that would never be garbage collected, potentially crashing the VM.
Blocks, Procs, and Lambdas
Ruby method calls can be followed by either {
/}
or do
/end
token pairs enclosing arbitrary code. This arbitrary code can receive pipe-enclosed (ex: |i, j|) arguments that are referenced in the code.
[1,2,3].each { |i| print i } # => '123'
The code between these enclosing tokens is known as a block. A block is a chunk of code attached to a method call. Here, for #each
item in the array, the block is executed. In this case, it just prints the item to standard output.
What isn’t obvious to new rubyists is how each item in the array gets inside the block. A great way to understand this is to write our own #each
.
class Colors
def self.each
yield 'Red'
yield 'Green'
yield 'Blue'
end
end
Colors.each { |color| print color } # => 'RedGreenBlue'
The yield
keyword executes the block with the arguments passed to it. In other words, The arguments in a block (ex: |i|) come from calls to yield
inside the method the block is attached to. If you want to iterate through a collection, you just need to yield
each item in it.
But what happens when the method is called without a block?
Colors.each #=> ...no block given (yield) (LocalJumpError)...
It turns out that yield
will try to execute the block regardless of whether there really is one. If we want a flexible method that yields to a block only if one is provided, Ruby provides #block_given?
.
class Colors
@colors = ['Red', 'Green', 'Blue']
def self.each
if block_given?
# send each color to the block if there is one
# #each returns the array
@colors.each { |color| yield color }
else
# otherwise just return the array
@colors
end
end
end
Storing Blocks
For someone coming from a language like JavaScript blocks are a bit like anonymous functions defined inside function calls (with the caveat that there can only be one per call). JavaScriptists will also be used to the fact that JavaScript functions can be stored in variables and passed to other functions. These are known as first-class functions. Strictly speaking, Ruby does not have first-class functions, but its blocks can be stored in callable objects.
Ruby provides two containers for storing blocks: procs and lambdas. These containers can be created in a number of ways:
time_proc = proc { Time.now }
time_lambda = lambda { Time.now }
# The popular, Ruby 1.8 compatible way to create a proc
old_proc = Proc.new { |foo| foo.upcase }
# Shorthand to create a lambda
stabby_lambda = ->(a, b) do
a + b
end
# Turning a block passed to a method into a Proc
def a_method(&block)
block
end
All callable objects are executed with #call
.
add = lambda { |a, b| a + b }
puts add.call(1,2) # => 3
Procs and lambdas are actually both Proc
objects. However, there are least two nuances to keep in mind:
- They treat the
return
keyword differently. An explicit return in a proc returns from the context in which the proc was defined. An explicit return in a lambda just returns from the lambda. - Lambdas check the number of arguments. Procs assign
nil
to missing arguments.
The first point deserves some extra attention because it can lead to opaque errors.
return_proc = proc { return }
return_lambda = lambda { return }
def first(p)
second(p)
end
def second(p)
p.call
end
first(return_lambda)
first(return_proc) # => LocalJumpError
The lambda executes without any problems. However, when we try to call the proc, we get a LocalJumpError
. This is because the proc was defined at the top level. The easiest way to get around the proc/block return issue is to avoid explicit returns. Instead, take advantage of Ruby’s implicit returns.
Note: In Ruby 1.8, proc
created procs that checked the number of arguments like lambda, while Proc.new
created what are now called procs. In Ruby 1.9+, proc
was fixed to behave identically to Proc.new
. Keep this in mind when using code that was written for 1.8.
Local Variable Scope
Ruby has two local scope barriers – points where local variables and arguments cannot pass:
- Module definitions
- Method definitions
Since classes are modules, there are three keywords to look for: module, class, and def.
lvar = 'x'
def print_local
puts lvar
end
print_local #=> NameError: undefined local variable or method...
So how do you work around scope barriers? It turns out that blocks inherit the scope that they are defined in. We can take advantage of this to pass local variables past these barriers by using alternatives that take blocks.
lvar = 'x'
MyClass = Class.new do
define_method :print_local do
puts lvar
end
end
MyClass.new.print_local #=> 'x'
Since each nested block contains the scopes from the higher blocks, the deepest block contains the scope with the local variable. Using nested blocks to provide access to local variables in this way is often referred to as flattening the scope.
Summary
- Instance variables are references prefixed with ‘@’ and belong to
self
– the current instance. - Instance variables can be created anywhere, including modules.
- The value of unassigned instance variables is
nil
- Ruby also has class variables, but they are almost global variables and shouldn’t be used.
- Modules are containers for methods.
- Classes are modules that also happen to be instance factories and have ancestors.
- Ruby does not have multiple inheritance, but multiple modules can be “mixed-in” to classes.
- All methods are instance methods in Ruby. Class methods are instance methods on class objects.
- A block is a chunk of code attached to a method call that inherits the scope that it is defined in.
- The arguments in a block (ex: |i|) come from calls to
yield
inside the method. - Blocks can be stored in procs or lambdas.
- Procs are like portable blocks. Lambdas are like portable methods.
- Symbols are Ruby’s answer to enumerated types.
- Symbols are not garbage collected, so they can be a security issue when generated dynamically.
- Local variables can’t cross the scope barriers of module and method definitions.
- Blocks can be used to carry local variables past scope barriers.
Next, in Part II we’ll explore more of the inner workings of the Ruby language.