🤯 50% Off! 700+ courses, assessments, and books

Symbol GC in Ruby 2.2

Richard Schneeman
Share

There is a Japanese translation of this post here!

Fotolia_62448068_Subscription_Monthly_M

What is symbol GC and why should you care? Ruby 2.2 was just released and, in addition to incremental GC, one of the other big features is Symbol GC. If you’ve been around the Ruby landscape, you’ve heard the term “symbol DoS”. A symbol denial of service attack occurs when a system creates so many symbols that it runs out of memory. This is because, prior to Ruby 2.2, symbols lived forever. For example in Ruby 2.1:

# Ruby 2.1

before = Symbol.all_symbols.size
100_000.times do |i|
  "sym#{i}".to_sym
end
GC.start
after = Symbol.all_symbols.size
puts after - before
# => 100001

Here we create 100,000 symbols and they’re still around, even though we’ve run GC and no variables reference those objects. This could easily be a problem if you wrote some code that accepted a user parameter and calls to_sym on it:

def show
  step = params[:step].to_sym
end

In this case, someone could make many requests to example.com/step= and, since your application never clears out symbols, your program would eventually run out of memory and crash. This may sound like a fabricated example, but it was similar to code I actually had committed in my Wicked gem (don’t worry, it’s fixed now). It’s not an isolated case either:

The list goes on and on, but you get the point. Creating symbols from user input is dangerous; only if symbols aren’t garbage collected, which is what is happening prior to Ruby 2.2.

Symbol GC in Ruby 2.2

Starting with Ruby 2.2 symbols are now garbage collected.

# Ruby 2.2
before = Symbol.all_symbols.size
100_000.times do |i|
  "sym#{i}".to_sym
end
GC.start
after = Symbol.all_symbols.size
puts after - before
# => 1

Since the symbols we create aren’t referenced by any other object or variable, they can be safely collected. This helps in preventing us from accidentally creating a scenario where a program creates and retains so many objects that it crashes. However, Ruby doesn’t garbage collect ALL symbols.

WAT?

#not_all_symbols

Previous to Ruby 2.2, we couldn’t collect symbols because they were used internally by the Ruby interpreter. Basically, each symbol has a unique object ID. For example :foo.object_id always needed to be the same value for the duration of the program execution. This is due to the way rb_intern works.

In C-Ruby, when you create a method it stores a unique ID to a method table.

Slide from Nari’s talk on Symbol GC

Later, when you call the method, Ruby will look up the symbol of the method name, and then get the ID of that symbol. The ID of the symbol is used to point at the static memory of the function in C. The function in C is then called and that’s how Ruby executes methods.

If we garbage collected a symbol and that symbol was used to reference a method, then that method is no longer callable. That would be bad.

To get around this problem Narihiro Nakamura introduced the idea of an “Immortal Symbol” in the C World and a “Mortal symbol” in the Ruby world.

Basically, all symbols created dynamically while Ruby is running (via to_sym, etc.) can be garbage collected because they are not being used behind the scenes inside the Ruby interpreter. However, symbols created as a result of creating a new method or symbols that are statically inside of code will not be garbage collected. For example :foo and def foo; end both will not be garbage collected, however "foo".to_sym would be eligible for garbage collection.

There are gotchas with this approach, it’s still possible to have a DoS if you’re accidentally creating methods based on user input.

define_method(params[:step].to_sym) do
  # ...
end

Because define_method calls rb_intern behind the scenes, even though we are passing in a dynamically defined (i.e. to_sym) symbol, it will be converted to an immortal symbol so it can be used for method lookup. Hopefully, you wouldn’t be doing that anyway, but it’s still good to point out dangerous bits in Ruby.

Variables also use symbols behind the scenes.

before = Symbol.all_symbols.size
eval %Q{
  BAR = nil
  FOO = nil
}
GC.start
after = Symbol.all_symbols.size
puts after - before
# => 2

Even though the variable is nil, it uses a symbol behind the scenes that will never get garbage collected. In addition to avoiding randomly defining methods based on user input, also watch out for creating variables based on user input:

self.instance_variable_set( "@step_#{ params[:step] }".to_sym, nil )

To be truly safe, you should periodically check Symbol.all_symbols.size after running GC.start to ensure that the symbol table isn’t growing. Moving into the future, hopefully some good standards around what is and isn’t safe to do with symbols becomes more general knowledge. If you find another really common gotcha, reach out to me on twitter and I’ll try to keep this section updated.

Thanks to @nari3 for reviewing this section and providing feedback. For more information about internals and implementation about this Read Nari’s slide’s or listen to the presentation at Ruby Kaigi.

I Feel the Need for Speed

In addition to security, the biggest reason you should care about this feature is speed. There’s a ton of code written around turning symbols into strings to avoid accidentally allocating symbols from user input. Generally when you put the words “ton” and “code” together, the results aren’t fast.

The most common example of avoiding Symbol allocations is Rail’s (ActiveSupport’s) HashWithIndifferentAccess. Since I wrote about subclasses of Hash like Hashie being slow, you may not be surprised to find that this behavior in Rails comes with a huge performance penalty.

require 'benchmark/ips'

require 'active_support'
require 'active_support/hash_with_indifferent_access'

hash = { foo: "bar" }
INDIFFERENT = HashWithIndifferentAccess.new(hash)
REGULAR     = hash

Benchmark.ips do |x|
  x.report("indifferent-string") { INDIFFERENT["foo"] }
  x.report("indifferent-symbol") { INDIFFERENT[:foo] }
  x.report("regular-symbol")     { REGULAR[:foo] }
end

When we run this:

Calculating -------------------------------------
  indifferent-string   115.962k i/100ms
  indifferent-symbol    82.702k i/100ms
      regular-symbol   150.856k i/100ms
-------------------------------------------------
  indifferent-string      4.144M (± 4.4%) i/s -     20.757M
  indifferent-symbol      1.578M (± 3.7%) i/s -      7.939M
      regular-symbol      8.685M (± 2.4%) i/s -     43.447M

We see that indifferent access hash with a string is about half the speed of a regular hash with symbol keys. We also see that using a symbol to access the value in an indifferent access hash is a whopping 5 times slower than using a regular hash with symbol keys. I wrote about how string key performance in Ruby 2.2 is getting a big improvement, however, accessing a hash with a symbol is still the fastest and, some might argue, the most aesthetically pleasing way to access a hash. Now with Ruby 2.2, we could use symbol keys in parameters in Rails. If we made that switch, we don’t have to worry about security, and we wouldn’t have to incur the overhead of the HashWithIndifferentAccess tax.

Note: You should do benchmarking at the application level before making any big performance changes, especially whenever it requires an API deprecation. Don’t ever submit a performance patch with the justification that “some blog said it was faster” even if that blog is mine. Always verify claims with a case by case benchmark.

Recap

Symbol GC saves your butt from DoS attacks and allows you the flexibility of using symbols wherever you want. Coupled with Ruby’s 2.2’s host of other performance features, including incremental GC and string de-duplication in with Hash keys, there’s no reason not to upgrade right away. Install locally:

$ ruby-install ruby 2.2.0

Run in production (if you’re using Heroku):

$ echo "ruby '2.2.0'" >> Gemfile

Don’t wait, the future of Ruby is now!


@schneems writes on Ruby, performance, and symbols, follow him for all that Jazz.

CSS Master, 3rd Edition