Deploying Rails Applications with the Puma Web Server

Last Updated: 18 June 2015

Table of Contents

Web applications that process concurrent requests make more efficient use of dyno resources than those that only process one request at a time. Puma is a webserver that competes with Unicorn and allows you to process concurrent requests.

Puma uses threads, in addition to worker processes, to make more use of available CPU. You can only utilize threads in Puma if your entire code-base is thread safe. Otherwise, you can still use Puma, but must only scale out through worker processes.

This guide will walk you through deploying a new Rails application to Heroku using the Puma web server. For basic Rails setup, see Getting Started with Rails.

Always test your new deployments in a staging environment before you deploy to your production environment.

Adding Puma to your application


First, add Puma to your app’s Gemfile:

gem 'puma'


Set Puma as the server for your web process in the Procfile of your application. You can set most values inline:

web: bundle exec puma -t 5:5 -p ${PORT:-3000} -e ${RACK_ENV:-development}

However we recommend generating a config file:

web: bundle exec puma -C config/puma.rb

Make sure the Procfile is properly capitalized and checked into git.


Create a configuration file for Puma at config/puma.rb or at a path of your choosing. For a simple Rails application, we recommend the following basic configuration:

workers Integer(ENV['WEB_CONCURRENCY'] || 2)
threads_count = Integer(ENV['MAX_THREADS'] || 5)
threads threads_count, threads_count


rackup      DefaultRackup
port        ENV['PORT']     || 3000
environment ENV['RACK_ENV'] || 'development'

on_worker_boot do
  # Worker specific setup for Rails 4.1+
  # See:

You must also ensure that your Rails application has enough database connections available in the pool for all threads and workers. (This will be covered later).

If your app is not thread safe, you will only be able to use workers. Set your min and max threads to 1:

$ heroku config:set MAX_THREADS=1

See the section below on thread safety for more information.


workers Integer(ENV['WEB_CONCURRENCY'] || 2)

The environment variable WEB_CONCURRENCY may be set to a default value based on dyno size. To manually configure this value use heroku config:set WEB_CONCURRENCY.

Puma forks multiple OS processes within each dyno to allow a Rails app to support multiple concurrent requests. In Puma terminology these are referred to as worker processes (not to be confused with Heroku worker processes which run in their own dynos). Worker processes are isolated from one another at the OS level, therefore not needing to be thread safe.

Multi process mode does not work if you are using JRuby or Windows because the JVM and Windows do not support processes. Omit this line from your config if you are using JRuby or Windows.

Each worker process used consumes additional memory. This limits how many processes you can run in a single dyno. With a typical Rails memory footprint, you can expect to run 2-4 Puma worker processes on a free, hobby or standard-1x dyno. Your application may allow for more or less depending on your specific memory footprint. We recommend specifying this number in a config var to allow for faster application tuning. Monitor your application logs for R14 errors (memory quota exceeded) via one of our logging addons or heroku logs.


threads_count = Integer(ENV['MAX_THREADS'] || 5)
threads threads_count, threads_count

Puma can serve each request in a thread from an internal thread pool. This allows Puma to provide additional concurrency for your web application. Loosely speaking, workers consume more RAM and threads consume more CPU, and both provide more concurrency.

On MRI, there is a Global Interpreter Lock (GIL) that ensures only one thread can be run at any time. IO operations such as database calls, interacting with the file system, or making external http calls will not lock the GIL. Most Rails applications heavily use IO, so adding additional threads will allow Puma to process multiple threads, gaining you more throughput. JRuby and Rubinius also benefit from using Puma. These Ruby implementations do not have a GIL and will run all threads in parallel regardless of what is happening in them.

Puma allows you to configure your thread pool with a min and max setting, controlling the number of threads each Puma instance uses. The min threads allows your application to spin down resources when not under load. This feature is not needed on Heroku as your application can consume all of the resources on a given dyno. We recommend setting min to equal max.

Each Puma worker will be able to spawn up to the maximum number of threads you specify.

Preload app


Preloading your application reduces the startup time of individual Puma worker processes and allows you to manage the external connections of each individual worker using the on_worker_boot calls. In the config above, these calls are used to correctly establish Postgres connections for each worker process.

On worker boot

The on_worker_boot block is run after a worker is spawned, but before it begins to accept requests. This block is especially useful for connecting to different services as connections cannot be shared between multiple processes. This is similar to Unicorn’s after_fork block. It is only needed if you are using multi process mode (i.e. have specified workers).

If you are using Rails 4.1+ you can use the database.yml to set your connection pool size and this is all you need to do:

on_worker_boot do
  # Valid on Rails 4.1+ using the `config/database.yml` method of setting `pool` size

Otherwise you must be very specific with the reconnection code:

on_worker_boot do
  # Valid on Rails up to 4.1 the initializer method of setting `pool` size
  ActiveSupport.on_load(:active_record) do
    config = ActiveRecord::Base.configurations[Rails.env] ||
    config['pool'] = ENV['MAX_THREADS'] || 5

If you are already using an initializer, you should switch over to the database.yml method as soon as possible. Using an initializer requires duplicating code if when using hybrid mode in Puma. The initializer method can cause confusion over what is happening and is the source of numerous support tickets.

In the default configuration we are setting the database pool size. For more information please read Concurrency and Database Connections in Ruby with ActiveRecord. We also make sure to create a new connection to the database here.

You will need to re-connect to any datastore such as Postgres, Redis, or memcache. In the pre-load section we show how to reconnect Active Record. If you are using Resque, which connects to Redis you would need to reconnect:

on_worker_boot do
  # ...
  if defined?(Resque)
     Resque.redis = ENV["<redis-uri>"] || "redis://"

If you get connection errors while booting up your application, consult the gem documentation for the service you are attempting to communicate with to see how you can re-connect in this block.


rackup      DefaultRackup

Use the rackup command to tell Puma how to start your rack app. This should point at your applications, which is automatically generated by Rails when you create a new project.

This line may not be needed on newer versions of Puma.


port        ENV['PORT']     || 3000

The port that Puma will bind to. Heroku will set ENV['PORT'] when the web process boots up. Locally, default this to 3000 to match the normal Rails default.


environment ENV['RACK_ENV'] || 'development'

Set the environment of Puma. On Heroku ENV['RACK_ENV'] will be set to 'production' by default.


There is no request timeout mechanism inside of Puma. The Heroku router will timeout all requests that exceed 30 seconds. Although an error will be returned back to the client, Puma will continue to work on the request as there is no way for the router to notify Puma that the request terminated early. To avoid clogging your processing ability we recommend using Rack::Timeout to terminate long running requests and locate their source.

Add the Rack Timeout gem to your project then in an initializer set the value to something lower than 30:

# config/initializers/timeout.rb
Rack::Timeout.timeout = 20  # seconds

Now any requests that continue for 20 seconds will be terminated and a stack trace output to your logs. The stack trace should help you determine what part of your application is causing the timeout so you can fix it.

Sample code

The open source project uses Puma and you can see the Puma config file in the repo

Thread safety

Thread safe code can be run across multiple threads without error. Not all Ruby code is threadsafe and it can be difficult to determine if your code and all of the libraries you are using can be run across multiple threads.

Until Rails 4, there was a thread safe compatibility mode that could be toggled. Though just because Rails is thread safe it doesn’t guarantee your code will be. If you haven’t run your application in a threaded environment we recommend deploying and setting MIN_THREADS and MAX_THREADS both to 1:

$ heroku config:set MIN_THREADS=1 MAX_THREADS=1

You can still gain concurrency by adding workers. Since a worker runs in a different process and does not share memory, code that is not thread safe can be run across multiple worker processes.

Once you have your application running on workers, you can try increasing the number of threads on staging and in development to 2:

$ heroku config:set MIN_THREADS=2 MAX_THREADS=2

You need to monitor exceptions and look for errors such as **deadlock detected (fatal)*, race conditions, and locations where you’re modifying global or shared variables.

Concurrency bugs can be difficult to detect and fix, so make sure to test your application thoroughly before deploying to production. If you can make your application thread safe, the benefit is greatly worth it, as scaling out with Puma threads and workers provide significantly more throughput than using workers alone.

Once you are confident that your application behaves as expected, you can increase your thread count.

To optimize thread count, we recommend looking at request latency. If your application is under load additional threads will decrease request latency, up to a point. Once adding new threads no longer gives your application measurable request time improvements there is no need to add additional threads

Database connections

As you add more concurrency to your application it will need more connections to your database. A good formula for determining the number of connections each application will require is to multiply the MAX_THREADS by the WEB_CONCURRENCY. This will determine the number of connections each dyno will consume.

Rails maintains its own database connection pool, with a new pool created for each worker process. Threads within a worker will operate on the same pool. Make sure there are enough connections inside of your Rails database connection pool so that MAX_THREADS number of connections can be used. If you see this error:

ActiveRecord::ConnectionTimeoutError - could not obtain a database connection within 5 seconds

This is an indication that your Rails connection pool is too low. For an in depth look at these topics please read the devcenter article Concurrency and Database Connections.

Slow clients

A slow client is one that sends and receives data slowly. For example, an app that receives images uploaded by users from mobile phones that are not on WiFi, 4G or other fast networks. This type of a connection can cause a denial of service for some servers, such as Unicorn, as workers must sit idle as they wait for the request to finish.

Puma can allow multiple slow clients to connect without requiring a worker to be blocked on the request transaction. Because of this, Puma handles slow clients gracefully. Heroku recommends Puma for use in scenarios where you expect slow clients.


It is possible to set a “backlog” value for Puma. This is the number of requests that will be queued at the socket before Puma begins rejecting HTTP requests. The default value is set to 1024, we recommend not modifying this value or decreasing it. It may seem like a good idea to decrease this value, so when a dyno is busy a request can get sent to a less busy dyno. When Heroku re-routes a bounced request it assumes your entire app is saturated. Each connection gets delayed by 5 seconds, so you’re automatically being penalized 5 seconds per request. You can read more about routing behavior. In addition when one of your dynos starts bouncing requests, it’s likely due to an increase in load and all of your dynos will be bouncing requests. Repeatedly bouncing the same request will result in higher error rates for your customers.

An arbitrarily high backlog value allows your dyno to handle a spike in requests. Lowering this value does little to speed up your app, and will actively cause more failed requests for your customers. Heroku recommends NOT setting the backlog value and instead using the default value.