Managing Heroku Processes
Table of Contents
Heroku runs and manages your application’s web, worker, and other types of processes, distributed across the dyno manifold.
Listing Processes
Get a list of processes on a given app:
$ heroku ps
Process State Command
------------ ------------------ ------------------------------
web.1 up for 10h bundle exec thin start -p $PORT -e..
web.2 up for 10h bundle exec thin start -p $PORT -e..
worker.1 up for 10h bundle exec rake jobs:work
The unix watch utility can be very handy in combination with the ps command. Run watch heroku ps in one terminal while you add or remove dynos and workers, deploy, or restart your app.
Process Restarts
In normal operation, dynos never exit on their own; they are restarted when you deploy new code, change config vars, or run heroku restart. The cases when a process can exit are as follows:
- Defect in startup code - If your app is missing a critical dependency, or has any other problem during startup, it will exit immediately with a stack trace.
- Transient error on a resource used during startup - If your app accesses a resource during startup, and that resource is offline, it may exit. For example, if you’re using Amazon RDS as your database and didn’t create a security group ingress for your Heroku app, your app will generate an error or time out trying to boot.
- Segfault in a binary library - If your app uses a binary library (for example, an XML parser), and that library crashes, then it may take your entire application with it. Exception handling can’t trap it, so your process will die.
- Interpreter or compiler bug - The rare case of a bug in an interpreter (Ruby, Python) or in the results of compilation (Java, Scala) can take down your process.
As app developers, we tend to see the first two errors as “boot crashes” and the second two as “runtime crashes.” However, Heroku has no way to distinguish these. From the platform’s perspective, all process crashes are alike.
Heroku’s process restart policy is to try to restart crashed processes once every ten minutes. This means that if you push bad code that prevents your app from booting, your app will be started once, then restarted, then get a cool-off of ten minutes. In the normal case of a long-running web or worker process getting an occasional crash, it will be restarted instantly without any intervention on your part. If your process crashes twice in a row, it will stay down for ten minutes before the system retries.
Graceful Shutdown with SIGTERM
The dyno manifold needs to stop or restart processes frequently - when you create a new release, when the dyno manifold needs to relocate your dyno, or when you request a manual restart with heroku restart. In all cases, the dyno manifold will request that your process shut down gracefully by sending it a SIGTERM.
The process has ten seconds to shut down cleanly (ideally, it will do so more quickly than that). During this time it should stop accepting new requests or jobs and attempt to finish its current requests, or put jobs back on the queue for another worker to handle. If the process hasn’t exited on its own in ten seconds, the dyno manifold will terminate it forcefully.
We can see how this works in practice with a sample worker process. We’ll use Ruby here as an illustrative language - the mechanism is identical in other languages. Imagine a process that does nothing but loop and print out a message periodically:
STDOUT.sync = true
puts "Starting up"
trap('TERM') do
puts "Graceful shutdown"
exit
end
loop do
puts "Pretending to do work"
sleep 3
end
If we deploy this (along with the appropriate Gemfile and Procfile) and heroku scale worker=1, we’ll see the process in its loop:
$ heroku logs
2011-05-31T23:31:16+00:00 heroku[worker.1]: Starting process with command: `bundle exec ruby worker.rb`
2011-05-31T23:31:17+00:00 heroku[worker.1]: State changed from starting to up
2011-05-31T23:31:17+00:00 app[worker.1]: Starting up
2011-05-31T23:31:17+00:00 app[worker.1]: Pretending to do work
2011-05-31T23:31:20+00:00 app[worker.1]: Pretending to do work
2011-05-31T23:31:23+00:00 app[worker.1]: Pretending to do work
Restart the process, which causes the dyno to send a SIGTERM:
$ heroku restart worker.1
Restarting worker.1 process... done
$ heroku logs
2011-05-31T23:31:26+00:00 app[worker.1]: Pretending to do work
2011-05-31T23:31:27+00:00 heroku[worker.1]: State changed from up to bouncing
2011-05-31T23:31:28+00:00 heroku[worker.1]: State changed from bouncing to down
2011-05-31T23:31:28+00:00 heroku[worker.1]: State changed from down to starting
2011-05-31T23:31:29+00:00 heroku[worker.1]: Stopping process with SIGTERM
2011-05-31T23:31:29+00:00 app[worker.1]: Graceful shutdown
2011-05-31T23:31:29+00:00 heroku[worker.1]: Process exited
Note that app[worker.1] logged “Graceful shutdown” (as we expect from our code); all the dyno manifold messages log as heroku[worker.1].
If we modify worker.rb to ignore the TERM signal, like so:
STDOUT.sync = true
puts "Starting up"
trap('TERM') do
puts "Ignoring TERM signal - not a good idea"
end
loop do
puts "Pretending to do work"
sleep 3
end
Now we see the behavior is changed:
$ heroku restart worker.1
Restarting worker.1 process... done
$ heroku logs
2011-05-31T23:40:57+00:00 heroku[worker.1]: Stopping process with SIGTERM
2011-05-31T23:40:57+00:00 app[worker.1]: Ignoring TERM signal - not a good idea
2011-05-31T23:40:58+00:00 app[worker.1]: Pretending to do work
2011-05-31T23:41:01+00:00 app[worker.1]: Pretending to do work
2011-05-31T23:41:04+00:00 app[worker.1]: Pretending to do work
2011-05-31T23:41:07+00:00 heroku[worker.1]: Error R12 (Exit timeout) -> Process failed to exit within 10 seconds of SIGTERM
2011-05-31T23:41:07+00:00 heroku[worker.1]: Stopping process with SIGKILL
2011-05-31T23:41:08+00:00 heroku[worker.1]: Process exited
Our process ignores SIGTERM and blindly continues on processing. After ten seconds, the dyno manifold gives up on waiting for the process to shut down gracefully, and kills its with SIGKILL. It registers Error R12 to indicate that the process is not behaving correctly.