Database Connection Pooling with Scala
Last updated May 03, 2024
Connection pooling is a pattern used by software applications to connect to databases using a pre-created set of reusable connection objects. When a new connection is required, an existing connection is retrieved from the pool. When the thread using the connection has completed, it is placed back in pool for use by another thread. This pattern reduces the overhead of connecting to a database by decreasing network traffic, limiting the cost of creating new connections, and reducing the load on the garbage collector.
Many Scala and Java Application Frameworks include their own connection pooling APIs. But the principles used to configure all frameworks are generally the same. In this article, you’ll learn how to create a database connection pool using the Java Database Connectivity (JDBC) API and the Apache DBCP pooling library.
If you already have a Scala application, you may use it for this example. Otherwise, create a simple application from the Getting Started with Scala on Heroku article before proceeding. You should also be familiar with Connecting to Relational Databases on Heroku with Java.
Using Apache DBCP
Open your application’s build.sbt
file and add the following libraries to it. If you’re using the sample Getting Started application, then you might need to upgrade your postgresql dependency.
Apache DBCP 2 is only compatible with Java 7 and JDBC 4.1. If you’re using Java 6 or JDBC 4, then you’ll need to use DBCP 1.4
libraryDependencies ++= Seq(
"org.postgresql" % "postgresql" % "9.3-1102-jdbc41",
"org.apache.commons" % "commons-dbcp2" % "2.0.1"
)
Now run sbt to download and install the new jar files:
$ sbt clean stage
...
[info] downloading http://repo1.maven.org/maven2/org/postgresql/postgresql/9.3-1102-jdbc41/postgresql-9.3-1102-jdbc41.jar ...
[info] [SUCCESSFUL ] org.postgresql#postgresql;9.3-1102-jdbc41!postgresql.jar (436ms)
[info] downloading http://repo1.maven.org/maven2/org/apache/commons/commons-dbcp2/2.0.1/commons-dbcp2-2.0.1.jar ...
[info] [SUCCESSFUL ] org.apache.commons#commons-dbcp2;2.0.1!commons-dbcp2.jar (332ms)
[info] downloading http://repo1.maven.org/maven2/org/apache/commons/commons-pool2/2.2/commons-pool2-2.2.jar ...
[info] [SUCCESSFUL ] org.apache.commons#commons-pool2;2.2!commons-pool2.jar (238ms)
...
[info] Done packaging.
[success] Total time: 5 s, completed Aug 18, 2014 10:35:38 AM
Next, open the source code for the class that will create connections. If you’re working from the Getting Started application, open the Server.scala
file. Add the following statements at the top of the file:
import java.sql.Connection
import java.sql.Statement
import org.apache.commons.dbcp2._
Then add a new singleton object to contain and configure our database connection pool. Add the following code to the end of the file:
object Datasource {
val dbUri = new URI(System.getenv("DATABASE_URL"))
val dbUrl = s"jdbc:postgresql://${dbUri.getHost}:${dbUri.getPort}${dbUri.getPath}"
val connectionPool = new BasicDataSource()
if (dbUri.getUserInfo != null) {
connectionPool.setUsername(dbUri.getUserInfo.split(":")(0))
connectionPool.setPassword(dbUri.getUserInfo.split(":")(1))
}
connectionPool.setDriverClassName("org.postgresql.Driver")
connectionPool.setUrl(dbUrl)
connectionPool.setInitialSize(3)
}
In this method, we’re retrieving the username
, password
and dbUrl
as we would with a non-pooled connection. Then we’re initializing the connectionPool
with those parameters and calling connectionPool.setInitialSize(3)
to set the initial size of the pool. The BasicDataSource
object will immediately create these connections for us and they will be ready to use when our application starts receiving traffic.
We can retrieve a connection from the pool like so:
val connection = Datasource.connectionPool.getConnection
val stmt = connection.createStatement()
stmt.executeUpdate("CREATE TABLE IF NOT EXISTS ticks (tick timestamp)")
stmt.executeUpdate("INSERT INTO ticks VALUES (now())")
val rs = stmt.executeQuery("SELECT tick FROM ticks")
while (rs.next()) {
println("Read from DB: " + rs.getTimestamp("tick") + "\n")
}
Once we have a connection object, we use it exactly the same as any other JDBC Connection
.
In addition to configuring the initial size of the connection pool, we might also want to set it’s maximum size, the maximum lifetime of the connections (before they is discarded and new ones replace them), or the maximum and minimum number of idle connections to keep before adjusting the size of pool. All of these can be set with methods on the BasicDataSource class.
You’ve learned how to configure the connection pool, but knowing what values to use when configuring it is a different topic.
Configuring the connection pool
The number of idle connections to keep warm in your pool depends on the size and nature of your application. Many users find one connection per thread handling HTTP requests is sufficient (assuming threads handling HTTP requests are the only threads using connections). Your application may need more if it experiences very high throughput such that it can’t turn connections over to new threads quick enough. Or you may need fewer if not every HTTP request needs to access the database. Ultimately, profiling your application under production loads is the best way to determine the appropriate pool parameters.
In development you can see the number of connections used by your application by checking the database.
$ psql -h localhost
psql (9.3.2)
Type "help" for help.
jkutner=# \q
This will open a connection to your development database. You can then see the number of connections to your postgres database by running:
select count(*) from pg_stat_activity where pid <> pg_backend_pid() and usename = current_user;
Which will return with the number of connections on that database:
count
-------
5
(1 row)
Under simulated production loads, this will give you a good indication of what size pool you need. There are, however, some constraints.
Maximum database connections
Heroku provides managed Postgres databases. Different tiered databases have different connection limits, which you can find listed on the Heroku Postgres add-on documentation. Databases in the lower tiers permit fewer connections than databases in the higher tiers. Once your database has the maximum number of active connections, it will no longer accept new connections. This will result in connection timeouts from your application and will likely cause exceptions.
When scaling out, it is important to keep in mind how many active connections your application needs. If each dyno allows 5 database connections, you can only scale out to four dynos before you need to provision a more robust database.
Now that you know how to configure your connection pool and how to figure out how many connections your database can handle you will need to calculate the right number of connections that each dyno will need.
Limit connections with PgBouncer
You can continue to scale out your applications with additional dynos until you have reached your database connection limits. Before you reach this point it is recommended to limit the number of connections required by each dyno by using the PgBouncer buildpack.
PGBouncer maintains a pool of connections that your database transactions share. This keeps connections to Postgres, which are otherwise open and idle, to a minimum. However, transaction pooling prevents you from using named prepared statements, session advisory locks, listen/notify, or other features that operate on a session level. See the PgBouncer buildpack FAQ for full list of limitations for more information.
For many frameworks, you must disable prepared statements in order to use PGBouncer. Then set your app to use a custom buildpack that will call other buildpacks.
For JDBC, this requires adding prepareThreshold=0
to the connection string. But it may also be necessary to patch your JDBC driver.
Do not continue before disabling prepared statements, or verifying that your framework is not using them.
$ heroku buildpacks:add heroku/pgbouncer
Next we need to ensure your application can run so you need to add your language specific buildpack. Since you are using Scala it would be:
$ heroku buildpacks:add heroku/scala
Now you must modify your Procfile
to start PgBouncer. In your Procfile
add the command bin/start-pgbouncer-stunnel
to the beginning of your web
entry. So if your Procfile
was
web: target/universal/stage/bin/scala-getting-started
Will now be:
web: bin/start-pgbouncer-stunnel target/universal/stage/bin/scala-getting-started
Commit the results to git, test on a staging app, and then deploy to production.
When deploying you should see this in the output:
=====> Detected Framework: pgbouncer-stunnel
For more information on connection pooling with Scala, Java, JDBC and Apache DBCP, see the Apache Commons Website.
You can find the source code for the examples used in this article on GitHub.