Ruby
Article

License to SIGKILL

By Richard Schneeman

Silhouette of secret agent. No transparency and gradients used.

Schneeman, Richard Schneeman.

Every program wants to live forever. What happens when a program is forced to exit before it’s done running, and why would we want to do that?

Unix Signals

Feel free to skip if you are familiar with signals.

In Unix, processes can communicate to each other with pre-defined signals. You can see a list of unix signals here. This ability to communicate is extremely important in a process oriented program. For example, the Puma webserver can add concurrency by spawning child “worker” processes. It accepts requests into a master process and then hands them off to the next available child. If the system that is running the Puma master process needs to shut down or restart, we don’t simply want all current requests to be stopped in their tracks. Instead, we want the child workers to finish processing the request if they can, clean up any external connections or temporary files they may have generated, then exit. The system can safely do this by sending a signal to the parent “master” process which is then, in turn, sent to the child processes.

You may have seen the movie Tron Legacy. The movie opens with a hacker breaking into a corporate network. The CEO sees it happening and deftly responds by typing in a $ kill -9 command into the terminal. This kill command in linux (and Mac OS X) sends the signal number 9, which is SIGKILL, to a process. SIGKILL means “end now without cleanup”. This is similar to using CTRL+ALT+DELETE on windows (though windows is not POSIX compliant and doesn’t support processes).

When we need our long running processes to exit gracefully, the signal SIGKILL is too strong. That signal forces processes to exit immediately and can leave your system in a bad state. What should you use instead? The signal SIGTERM (signal number 15) is the “termination signal”. This tells a program that it needs to stop what it’s doing and clean up before exiting.

Live and Let Die

When Ruby receives a SIGTERM signal it raises a SignalException error. In Ruby, an exception can happen at any point while a process is running, so critical clean up code should always use an ensure block.

begin
  # do something
ensure
  # clean up something
end

There are notable caveats here, such as an exception can be raised while an ensure block is already running, so we can’t always rely on it to execute. For an in-depth look into errors in Ruby, I recommend Avdi’s Exceptional Ruby. That being said, it’s still a best practice to use ensure blocks to safeguard your code.

Since we already have this failsafe behavior, Ruby uses it when a SignalException is raised. To verify this we can write a trivial script:

Thread.new do
  begin
    while true
      sleep 1
    end
  ensure
    puts "ensure called"
  end
end

current_pid = Process.pid
signal      = "SIGTERM"
Process.kill(signal, current_pid)

When you run this you’ll see:

ensure called
Terminated: 15

You’ll notice that, in addition to the “ensure called”, we also get the number of the signal that was used to exit the process (15, which corresponds to SIGTERM). Neat. This behavior is really convenient, since any program that has error handling is already equipped to gracefully exit. By putting sensitive operations in an ensure block, we’re making it more likely that the program will do the right thing. After all, the ensure blocks get called then the program will exit. Note that if you re-run the program with SIGKILL instead, it exits with a different number and we don’t get output from the ensure block.

Tomorrow Never Dies

I’m sure you’ve had a frustrating app on your computer that was frozen and wouldn’t die no matter how many times you clicked the “close” button. Some stubborn programs will never exit, no matter how many times you send SIGTERM to them. This can happen when the program gets stuck trying to clean itself up. We can reproduce this easily:

thread = Thread.new do
  begin
    while true
      sleep 1
    end
  ensure
    while true
      puts "ensure called"
      sleep 1
    end
  end
end

current_pid = Process.pid
signal      = "SIGTERM"
Process.kill(signal, current_pid)

The output will look like this:

ensure called
ensure called
ensure called
ensure called
ensure called
# ...
ensure called

It will never end until the machine is restarted or SIGKILL is sent. Instead of this trivial example, it’s easy to imagine your Ruby program waiting on a database query or network call to finish. If it is hung and your program never gets a response, it will never exit. That’s why it’s always critical to timeout sensitive code, though be careful with timeout.rb.

It’s important to note that all ensure blocks in scope will be called when a SignalException is raised. This means that, in addition to your own code, all the codes in any dependencies will be called. If your system is hanging on exit and you can’t determine an errant ensure block that you’ve committed, it may be from a library you’re using.

Say (Dr.) No to Signal Trapping

Another way that you can prevent a program from exiting is to use Signal.trap. When you run this code, the signal will get captured and the program will not exit.

Signal.trap('TERM') do
  puts "Die Another Day"
end

current_pid = Process.pid
signal      = "SIGTERM"
Process.kill(signal, current_pid)

When you execute the program, you get the output "Die Another Day" but it continues to execute. It is possible to trap and re-raise the same signal, however this is a very large hammer. We can’t depend on a signal being sent to the program, nor can we rely on this code getting run in the block. Worse yet, when we do get a signal, the system needs us to clean up and exit as quickly as possible. The best practice would be to use ensure blocks whenever possible and only resort to signal trapping when it’s really necessary.

From Russia with Love and Signals

So far, we’ve looked at how your Ruby code handles signals, but how would you know what signals to send? Before any restart or shutdown you would send a SIGTERM to let it clean up, then monitor the process to see if it shuts down in a reasonable time frame. If it doesn’t, send a SIGKILL to shut down the process, ending any infinite ensure blocks. You would make a note of when your process does not exit from a SIGTERM as it could mean that, when you force kill the process, you’re interrupting some important work or cleanup process. The company I work for, Heroku, goes through these steps every time you deploy or restart your application. If, for some reason, your application won’t exit on time, the system emits an R12 – Exit Timeout and records the error on your dashboard view so you can investigate later.

While it’s difficult to conceptualize, an exception might stop your entire program at any time, so it’s nice to know that adding ensure to places that should already have them is all you need to do to be safe. Whether you’re working for Her Majesty’s Secret Service or in an IT department at a Casino Royale, you can take a Quantum of Solace knowing that your programs can exit gracefully.

  • http://mikeritteroline.com Mike Ritter

    This was insightful… but I have to commend the 007 references.

    Flemming would be proud.

  • Clifford Heath

    Windows does have processes, and various IPC mechanisms that can be used to implement (something like) signals. It also has a Posix layer, but that’s a bit of an island done to earn a tick in a checkbox in some US Gov dept, not something to use.

  • Clifford Heath

    Also, why does Ruby so comprehensively mess up the Readline behaviour in response to the job control signals SIGTSTP, SIGCONT? It’s incredibly annoying to a habitual user of ^Z to do that in the Rails console or IRB, and to have to kill the entire terminal session to recover it.

  • http://www.convalesco.org Panagiotis Atmatzidis

    Hello, Richard nice article.

    After digging around SIGNALS I came with SIGINT for a script I had to a while back, I opted for “SIGINT” as a way of interrupting a stream recording.

    I don’t use that script anymore since the show is published online now but out of curiosity, do you think that I should be using ‘SIGTERM’ instead?

    Thanks

    ps. Code: https://github.com/atmosx/Myscripts/blob/argos/real-recording.rb

Recommended

Learn Coding Online
Learn Web Development

Start learning web development and design for free with SitePoint Premium!

Get the latest in Ruby, once a week, for free.