Ruby Application Restart Behavior
Last updated March 20, 2023
Table of Contents
When Heroku is going to shut down a dyno (for a restart or a new deployment, etc.), it first sends a SIGTERM signal to the processes in the dyno. The full process is documented in Dynos and the Dyno Manager.
This Unix signal provides an opportunity for a process to “shut down gracefully”. The Ruby VM will receive this signal and sends a SignalException
to the process, which will interrupt what your process is currently doing so it can clean itself up.
Ensuring processes clean up
How exactly does your process clean itself up? Ruby programs that use the ensure
keyword will be activated at this time. An example:
thread = Thread.new do
begin
while true
sleep 1
end
ensure
puts "ensure called"
end
end
current_pid = Process.pid
signal = "SIGTERM"
Process.kill(signal, current_pid)
When you run this you’ll see:
ensure called
Terminated: 15
Developers (should) naturally wrap sensitive operations such as deleting temporary files or closing connections in this ensure block. By putting these operations in an ensure
block then we’re making it more likely that the program will do the right thing when it exits.
Once all ensure blocks (that are in scope) have been called, the program exits.
After Heroku sends SIGTERM
to your application, it will wait up to 30 seconds before sending a SIGKILL
to force it to shut down, even if it has not finished cleaning up. In this example, the ensure
block does not get called at all, the program simply exits:
thread = Thread.new do
begin
while true
sleep 1
end
ensure
puts "ensure called"
end
end
current_pid = Process.pid
signal = "SIGKILL"
Process.kill(signal, current_pid)
The output is simply:
Killed: 9
This is the equivalent of running the famous $ kill -9
command. It is synonymous with CTRL+ALT+DELETE on Windows or a “force quit” on OS X.
Why some programs won’t die
Some programs will never terminate on their own after a SIGTERM
. Here’s an example:
thread = Thread.new do
begin
while true
sleep 1
end
ensure
while true
puts "ensure called"
sleep 1
end
end
end
current_pid = Process.pid
signal = "SIGTERM"
Process.kill(signal, current_pid)
The output will look like this:
ensure called
ensure called
ensure called
ensure called
ensure called
ensure called
ensure called
ensure called
ensure called
Forever and ever. You can imagine, instead of a while loop, the ensure
block is trying to close a connection and the remote server won’t respond. In this case the program would never end, and it would be stuck unable to respond to requests or die.
This is then one way that a program can prevent itself from dying: by doing complicated, long, or impossible to complete tasks in the ensure block.
Even if you’re not using ensure blocks, you’re certainly using gems that do. One of them may be preventing your program from exiting. The way to fix this problem is by sending a SIGKILL
to the process, which terminates it immediately.
The other way that your program may prevent itself from halting is if it catches the signal.
Signal.trap('TERM') do
puts "Never going to die"
end
current_pid = Process.pid
signal = "SIGTERM"
Process.kill(signal, current_pid)
This Signal.trap
block catches the signal and prevents it from doing it’s original work. I don’t know of a good reason for using this functionality and your program probably shouldn’t. If you must trap a signal, it is possible to call the previous behavior Run code when Signal is sent, but do not trap the signal in Ruby. Again this is still not recommended, when your program receives a signal it should exit quickly. It’s also not guaranteed that this will get called. A safer approach is using an ensure
block where appropriate.
at_exit
You can also use Ruby’s at_exit to call code when your application is exiting
at_exit { puts "done" }
current_pid = Process.pid
signal = "SIGTERM"
Process.kill(signal, current_pid)
# => done
# => Terminated: 15