License to SIGKILL — SitePoint

Every program wants to live forever. What happens when a program is forced to exit before it’s done running, and why would we want to do that?

Unix Signals

Feel free to skip if you are familiar with signals.

In Unix, processes can communicate to each other with pre-defined signals. You can see a list of unix signals here. This ability to communicate is extremely important in a process oriented program. For example, the Puma webserver can add concurrency by spawning child “worker” processes. It accepts requests into a master process and then hands them off to the next available child. If the system that is running the Puma master process needs to shut down or restart, we don’t simply want all current requests to be stopped in their tracks. Instead, we want the child workers to finish processing the request if they can, clean up any external connections or temporary files they may have generated, then exit. The system can safely do this by sending a signal to the parent “master” process which is then, in turn, sent to the child processes.

You may have seen the movie Tron Legacy. The movie opens with a hacker breaking into a corporate network. The CEO sees it happening and deftly responds by typing in a $ kill -9 command into the terminal. This kill command in linux (and Mac OS X) sends the signal number 9, which is SIGKILL, to a process. SIGKILL means “end now without cleanup”. This is similar to using CTRL+ALT+DELETE on windows (though windows is not POSIX compliant and doesn’t support processes).

When we need our long running processes to exit gracefully, the signal SIGKILL is too strong. That signal forces processes to exit immediately and can leave your system in a bad state. What should you use instead? The signal SIGTERM (signal number 15) is the “termination signal”. This tells a program that it needs to stop what it’s doing and clean up before exiting.

Live and Let Die

When Ruby receives a SIGTERM signal it raises a SignalException error. In Ruby, an exception can happen at any point while a process is running, so critical clean up code should always use an ensure block.

begin
  # do something
ensure
  # clean up something
end

There are notable caveats here, such as an exception can be raised while an ensure block is already running, so we can’t always rely on it to execute. For an in-depth look into errors in Ruby, I recommend Avdi’s Exceptional Ruby. That being said, it’s still a best practice to use ensure blocks to safeguard your code.

Since we already have this failsafe behavior, Ruby uses it when a SignalException is raised. To verify this we can write a trivial script:

Thread.new do
  begin
    while true
      sleep 1
    end
  ensure
    puts "ensure called"
  end
end

current_pid = Process.pid
signal      = "SIGTERM"
Process.kill(signal, current_pid)

When you run this you’ll see:

ensure called
Terminated: 15

You’ll notice that, in addition to the “ensure called”, we also get the number of the signal that was used to exit the process (15, which corresponds to SIGTERM). Neat. This behavior is really convenient, since any program that has error handling is already equipped to gracefully exit. By putting sensitive operations in an ensure block, we’re making it more likely that the program will do the right thing. After all, the ensure blocks get called then the program will exit. Note that if you re-run the program with SIGKILL instead, it exits with a different number and we don’t get output from the ensure block.

Tomorrow Never Dies

I’m sure you’ve had a frustrating app on your computer that was frozen and wouldn’t die no matter how many times you clicked the “close” button. Some stubborn programs will never exit, no matter how many times you send SIGTERM to them. This can happen when the program gets stuck trying to clean itself up. We can reproduce this easily:

thread = Thread.new do
  begin
    while true
      sleep 1
    end
  ensure
    while true
      puts "ensure called"
      sleep 1
    end
  end
end

current_pid = Process.pid
signal      = "SIGTERM"
Process.kill(signal, current_pid)

The output will look like this:

ensure called
ensure called
ensure called
ensure called
ensure called
# ...
ensure called

It will never end until the machine is restarted or SIGKILL is sent. Instead of this trivial example, it’s easy to imagine your Ruby program waiting on a database query or network call to finish. If it is hung and your program never gets a response, it will never exit. That’s why it’s always critical to timeout sensitive code, though be careful with timeout.rb.

It’s important to note that all ensure blocks in scope will be called when a SignalException is raised. This means that, in addition to your own code, all the codes in any dependencies will be called. If your system is hanging on exit and you can’t determine an errant ensure block that you’ve committed, it may be from a library you’re using.

Say (Dr.) No to Signal Trapping

Another way that you can prevent a program from exiting is to use Signal.trap. When you run this code, the signal will get captured and the program will not exit.

Signal.trap('TERM') do
  puts "Die Another Day"
end

current_pid = Process.pid
signal      = "SIGTERM"
Process.kill(signal, current_pid)

When you execute the program, you get the output "Die Another Day" but it continues to execute. It is possible to trap and re-raise the same signal, however this is a very large hammer. We can’t depend on a signal being sent to the program, nor can we rely on this code getting run in the block. Worse yet, when we do get a signal, the system needs us to clean up and exit as quickly as possible. The best practice would be to use ensure blocks whenever possible and only resort to signal trapping when it’s really necessary.

From Russia with Love and Signals

So far, we’ve looked at how your Ruby code handles signals, but how would you know what signals to send? Before any restart or shutdown you would send a SIGTERM to let it clean up, then monitor the process to see if it shuts down in a reasonable time frame. If it doesn’t, send a SIGKILL to shut down the process, ending any infinite ensure blocks. You would make a note of when your process does not exit from a SIGTERM as it could mean that, when you force kill the process, you’re interrupting some important work or cleanup process. The company I work for, Heroku, goes through these steps every time you deploy or restart your application. If, for some reason, your application won’t exit on time, the system emits an R12 – Exit Timeout and records the error on your dashboard view so you can investigate later.

While it’s difficult to conceptualize, an exception might stop your entire program at any time, so it’s nice to know that adding ensure to places that should already have them is all you need to do to be safe. Whether you’re working for Her Majesty’s Secret Service or in an IT department at a Casino Royale, you can take a Quantum of Solace knowing that your programs can exit gracefully.

Frequently Asked Questions (FAQs) about SIGKILL

What is the difference between SIGKILL and other termination signals?

SIGKILL is a type of signal that can be sent to a process in Unix and Linux operating systems. Unlike other signals such as SIGTERM or SIGHUP, SIGKILL cannot be caught, blocked, or ignored by the process. This means that when a process receives a SIGKILL signal, it is immediately terminated by the operating system. Other signals, on the other hand, can be handled by the process in different ways, allowing it to perform cleanup operations before exiting.

When should I use SIGKILL?

SIGKILL should be used as a last resort when a process cannot be terminated using other signals. Because SIGKILL immediately terminates the process without allowing it to perform any cleanup operations, it can potentially lead to data loss or other unintended consequences. Therefore, it is generally recommended to try other signals such as SIGTERM first, which allow the process to gracefully exit.

Can SIGKILL be blocked or ignored?

No, SIGKILL cannot be blocked or ignored by a process. This is what makes it a powerful tool for terminating unresponsive processes. However, this also means that it should be used with caution, as it does not allow the process to perform any cleanup operations before exiting.

How can I send a SIGKILL signal to a process?

You can send a SIGKILL signal to a process using the ‘kill’ command followed by the ‘-9’ option and the process ID. For example, ‘kill -9 1234’ would send a SIGKILL signal to the process with ID 1234.

What happens if a process does not respond to SIGKILL?

If a process does not respond to a SIGKILL signal, it is likely in a state known as ‘uninterruptible sleep’, usually waiting for I/O operations to complete. In such cases, even SIGKILL cannot terminate the process. The process will only terminate once the I/O operation is complete.

Can SIGKILL cause data loss?

Yes, because SIGKILL does not allow a process to perform any cleanup operations before exiting, it can potentially lead to data loss. This is why it is generally recommended to try other signals such as SIGTERM first, which allow the process to gracefully exit.

Is SIGKILL specific to Unix and Linux?

Yes, SIGKILL is a signal specific to Unix and Linux operating systems. Other operating systems may have similar mechanisms for terminating processes, but they may not work in exactly the same way.

Can I catch a SIGKILL signal in my program?

No, SIGKILL signals cannot be caught by a program. This means that you cannot write a signal handler for SIGKILL in your program to perform specific actions when a SIGKILL signal is received.

What is the numerical value of SIGKILL?

The numerical value of SIGKILL is 9. This is why you often see the ‘-9’ option used with the ‘kill’ command to send a SIGKILL signal to a process.

Can I send a SIGKILL signal to any process?

In general, you can send a SIGKILL signal to any process. However, some processes, such as those running as root or those owned by other users, may require special permissions to send signals to.