Concurrency and Database Connections in Ruby with ActiveRecord
Last updated April 26, 2024
Table of Contents
When increasing concurrency by using a multi-threaded web server like Puma, or multi-process web server like Unicorn, you must be aware of the number of connections your app holds to the database and how many connections the database can accept. Each thread or process requires a different connection to the database. To accommodate this, Active Record provides a connection pool that can hold several connections at a time.
Connection pool
By default Rails (Active Record) will only create a connection when a new thread or process attempts to talk to the database through a SQL query. Active Record limits the total number of connections per application through a database setting pool
; this is the maximum size of the connections your app can have to the database. The default maximum size of the database connection pool is 5. If you try to use more connections than are available, Active Record will block and wait for a connection from the pool. When it cannot get a connection, a timeout error will be thrown. It may look something like this:
ActiveRecord::ConnectionTimeoutError - could not obtain a database connection within 5 seconds. The max pool size is currently 5; consider increasing it
To avoid this error you can change the size of your connection pool manually by customizing your connection settings. While the means are similar, the location of your connect setup can vary for threaded vs. multi-process web servers.
Threaded servers
For servers that achieve concurrency via threads we recommend using an initializer to configure your database pool. When your Rails application boots, it will execute the code in your initializer and establish the connection with your customizations.
For Rails 4.1+ you can set these values directly in your config/database.yml
production:
url: <%= ENV["DATABASE_URL"] %>
pool: <%= ENV["DB_POOL"] || ENV['RAILS_MAX_THREADS'] || 5 %>
Otherwise if you are using an older version of Rails you will need to use an initializer.
# config/initializers/database_connection.rb
# Use config/database.yml method if you are using Rails 4.1+
Rails.application.config.after_initialize do
ActiveRecord::Base.connection_pool.disconnect!
ActiveSupport.on_load(:active_record) do
config = ActiveRecord::Base.configurations[Rails.env] ||
Rails.application.config.database_configuration[Rails.env]
config['pool'] = ENV['DB_POOL'] || ENV['RAILS_MAX_THREADS'] || 5
ActiveRecord::Base.establish_connection(config) # Establish connection is not needed for Rails 5.2+ https://github.com/rails/rails/pull/31241
end
end
If you are already using an initializer, you should switch over to the database.yml
method as soon as possible. Using an initializer requires duplicating code if you are using a forking webserver such as Unicorn or Puma (in hybrid mode). The initializer method can cause confusion over what is happening and is the source of numerous support tickets.
If you are using the Puma web server we recommend setting the pool
value to equal ENV['RAILS_MAX_THREADS']
. When using multiple processes each process will contain its own pool so as long as no worker process has more than ENV['RAILS_MAX_THREADS']
then this setting should be adequate.
Multi-process servers
For a forking server such as Unicorn, the master process will boot your rails applications (and execute any initializers) and then fork workers. For this reason it’s necessary to disconnect in your master process in the before_fork
and then re-establish the connection in an after_fork
block:
# config/unicorn.rb
before_fork do |server, worker|
# other settings
if defined?(ActiveRecord::Base)
ActiveRecord::Base.connection.disconnect!
end
end
after_fork do |server, worker|
# other settings
if defined?(ActiveRecord::Base)
ActiveRecord::Base.establish_connection # Establish connection is not needed for Rails 5.2+ https://github.com/rails/rails/pull/31241
end
end
For Unicorn, this connection setup should be in addition to the normal recommended configuration as described in the Deploying Rails Applications With Unicorn guide.
If you are using Rails 4.1+ then ActiveRecord::Base.establish_connection
will use the connection information stored in config/database.yml
. Otherwise, you will need to duplicate the behavior in your initializer to ensure consistent connection information:
# config/unicorn.rb
# Use config/database.yml method if you are using Rails 4.1+
after_fork do |server, worker|
# other settings
if defined?(ActiveRecord::Base)
config = ActiveRecord::Base.configurations[Rails.env] ||
Rails.application.config.database_configuration[Rails.env]
config['pool'] = ENV['DB_POOL'] || 5
ActiveRecord::Base.establish_connection(config)
end
end
Note we set the pool
to 5 connections or the value specified in the DB_POOL
env var. Now you can set the connection pool size by setting a config var on Heroku. For instance if you wanted to set it to 10 you could run:
$ heroku config:set DB_POOL=10
This doesn’t mean that each dyno will now have 10 open connections, but only that if a new connection is needed it will be created until a maximum of 10 have been used per Rails process.
Even if you have enough connections in your pool, your database may have a maximum number of connections that it will allow.
Maximum database connections
Heroku provides managed Postgres databases. Different tiered databases have different connection limits. The Essential-tier databases are limited to 20 connections. Databases in the Standard tier or larger have higher limits. After your database reaches the maximum number of active connections, it no longer accepts new connections. Reaching the limit results in connection timeouts from your application and is likely to cause exceptions.
When scaling out, it is important to keep in mind how many active connections your application needs. If each dyno allows 5 database connections, you can only scale out to four dynos before you need to provision a more robust database.
Now that you know how to configure your connection pool and how to figure out how many connections your database can handle you will need to calculate the right number of connections that each dyno will need.
Calculating required connections
Assuming that you are not manually creating threads in your application code, you can use your web server settings to guide the number of connections that you need. The Unicorn web server scales out using multiple processes, if you aren’t opening any new threads in your application, each process will take up 1 connection. So in your unicorn config file if you have worker_processes
set to 3
like this:
worker_processes 3
Then your app will use 3 connections for workers. This means each dyno will require 3 connections. If you’re on a “Dev” plan, you can scale out to 6 dynos which will mean 18 active database connections, out of a maximum of 20. However, it is possible for a connection to get into a bad or unknown state. Due to this we recommend setting the pool
of your application to either 1
or 2
to avoid zombie connections from saturating your database. See the “Bad connection” section below.
Another web server, Puma, gets concurrency using threads (16 by default). This means it would require 16 connections in the pool to operate without exception. It’s likely that your dyno isn’t taking full advantage of all 16 of these threads, so with tuning you could figure out an optimal number and specify it in your Procfile
. If you wanted Puma to only use 5 threads and therefore 5 maximum connections, you can configure it to use a maximum of 5 threads 0:5
like this:
web: bundle exec puma -t 0:5 -p $PORT -e ${RACK_ENV:-development}
Every application will have different performance characteristics and different requirements. To properly tune the number of threads for your app you will need to load test your app in a production-like or staging environment.
Number of active connections
In development you can see the number of connections taken up by your application by checking the database.
$ bundle exec rails dbconsole
This will open a connection to your development database. You can then see the number of connections to your postgres database by running:
select count(*) from pg_stat_activity where pid <> pg_backend_pid() and usename = current_user;
Which will return with the number of connections on that database:
count
-------
5
(1 row)
Since connections are opened lazily, you’ll need to hit your running application at localhost
several times until the count quits going up. To get an accurate count you should run that database query inside of a production database since your development setup may not allow you to generate load required for your app to create new connections.
Background Workers
If you are using a worker
process type and using a background worker library like Sidekiq you may want to have your settings be different in different dyno types. By default Sidekiq uses 10 threads which means that either your database on your worker will need to be 10+ threads or you will need to configure your Sidekiq process to use fewer threads.
If you have your Rails app configured from the previous sections, you can set RAILS_MAX_THREADS
to a different value. For example:
worker: RAILS_MAX_THREADS=${SIDEKIQ_RAILS_MAX_THREADS:-10} bundle exec sidekiq
In this example, you would need to heroku config:set SIDEKIQ_RAILS_MAX_THREADS=5
.
Instead, if you wanted to change the number of connections in Sidekiq you can boot it with the -c
flag and use a different configuration variable such as SIDEKIQ_CONCURRENCY
.
worker: bundle exec sidekiq -c ${SIDEKIQ_CONCURRENCY:-5}
In this example we are telling Sidekiq to only use 5 threads to process background jobs.
Bad connections
It is possible for connections to hang, or be placed in a “bad” state. This means that the connection will be unusable, but remain open. If you are running a multi-process web server such as Unicorn this could mean that over time a 3 worker dyno which normally consumes 3 database connections could be holding as many as 15 connections (5 default connections per pool times 3 workers). To limit this threat lower the connection pool to 1
or 2
and enable connection reaping which is available in Rails 4, though it was turned off by default after this bug report
The 'reaping_frequency'
can tell Active Record to check to see if connections are hung or dead every N seconds and terminate them. While it is likely that over time your application may have a few connections that hang, if something in your code is causing hung connections, the reaper will not be a permanent fix to the problem.
Limit connections with PgBouncer
You can continue to scale out your applications with additional dynos until you have reached your database connection limits. Before you reach this point it is recommended to limit the number of connections required by each dyno by using the PgBouncer buildpack.
PgBouncer maintains a pool of connections that your database transactions share. This keeps connections to Postgres, that are otherwise open and idle, to a minimum. However, transaction pooling prevents you from using named prepared statements, session advisory locks, listen/notify, or other features that operate on a session level. See the PgBouncer buildpack FAQ for full list of limitations for more information.
For many frameworks, you must disable prepared statements in order to use PgBouncer. Then add the PgBouncer buildpack to your app.
Do not continue before disabling prepared statements, or verifying that your framework is not using them. Rails 3+ uses prepared statements.
$ heroku buildpacks:add heroku/pgbouncer
Ensure that you’ve also got your primary language buildpack listed:
$ heroku buildpacks
1. heroku/ruby
2. heroku/pgbouncer
If you’re using a different language than Ruby expect your first line to be different.
Now you must modify your Procfile
to start PgBouncer. In your Procfile
add the command bin/start-pgbouncer-stunnel
to the beginning of your web
entry. So if your Procfile
was
web: bundle exec puma -C config/puma.rb
Will now be:
web: bin/start-pgbouncer-stunnel bundle exec puma -C config/puma.rb
Commit the results to git, test on a staging app, and then deploy to production.
When deploying you should see this in the output:
=====> Detected Framework: pgbouncer-stunnel