Heroku Postgres Logical Backups
Last updated December 28, 2021
Table of Contents
- How Logical Backups Work
- The Performance Impact of Logical Backups
- Capturing Logical Backups on Small Databases with Heroku PGBackups
- Capturing Logical Backups on Larger Databases
- Capturing Logical Backups for Storage in Another Region
- Restoring Logical Backups
- Direct Transfer of Logical Backups Between Heroku Postgres Databases
Capturing a logical backup creates a single snapshot file from your Heroku Postgres database. While all Heroku Postgres databases are protected through continuous physical backups, logical backups are more flexible for data portability.
Unlike physical backups, logical backups aren’t captured automatically because they share database resources with application code. This article covers how to perform logical backups with minimal effect on your database performance.
How Logical Backups Work
pg_dump uses a Postgres connection to run a series of SQL
COPY statements in a single transaction to produce a consistent snapshot across the database. The file that
pg_dump produces is much smaller than the size of the database. It contains only the live dataset and instructions on how to remake indexes, but not the indexes themselves.
Dumps in script format are plain-text files of SQL commands used to reconstruct the database at the time of capture by restoring through
Archive file formats are more flexible and performant than script formats. The custom format allows for compression as well as opportunistic reordering of all archived items. You can use
pg_restore to restore archive file formats. This article focuses on backups that are in an archived, compressed format.
The Performance Impact of Logical Backups
The effects of
pg_dump grow worse with the size of your database. Balancing your database’s performance with your backup requirements is necessary to avoid unpleasant surprises.
pg_dump causes many effects on your database:
- It consumes as much filesystem I/O as available and contends for resources with concurrently running queries.
- It impedes other operations that require an exclusive lock, such as schema changes.
VACUUMand other automatic maintenance processes are paused since long-running transactions prevent autovacuum from freezing rows visible to those transactions.
- It evicts cached data from OS and filesystem caches.
Capturing Logical Backups on Small Databases with Heroku PGBackups
Heroku Postgres comes with a
pg:backups command that allows you to manually capture or schedule logical backups via the CLI. It performs a
pg_dump behind the scenes resulting in an archive file that can be restored with
pg_restore. All backups captured via
pg:backups are stored in the U.S. regardless of where your database is located.
Use Heroku PGBackups only on databases up to 20 GB. Contention for the I/O, memory, and CPU needed for backing up a larger database can be prohibitive and cause your backup capture to end prematurely.
For more info, see Heroku PGBackups.
Capturing Logical Backups on Larger Databases
Frequent logical backups of large databases under moderate CPU load can be slow, degrade the performance of other queries, and prevent necessary maintenance operations from running.
For databases over 20 GB, take logical backups against a short-lived fork of your Heroku Postgres database. Capturing a logical backup from a fork preserves the original database’s performance and allows the backup operation to consume as many resources as necessary for the dump to succeed.
Create a short-lived fork of your Heroku Postgres database.
Create a script to run the
pg_dumpcommand and transfer the backup file to the destination of your choice. The following
pg_dumpcommand creates a backup file that is the same format as the Heroku PGBackups output.
pg_dump -F d --no-acl --no-owner --quote-all-identifiers $FORKED_DATABASE_URL
Ensure that the script includes the transfer of your backup file after the dump is complete and that the
pg_dump command points to your forked database. You can optionally use the
--jobs flag to parallelize the dump operation. You can also adjust the amount of compression with the
--compress flag. See more configuration options in the official Postgres docs.
Upload the file to your app so that it’s accessible to your one-off dynos.
Run the script in a detached one-off dyno in the same region as the database. A Performance-L, Private-L , or Shield-L dyno is recommended to provide sufficient compute power. For example
heroku run:detached --app example-app-name --size=performance-L script.sh
Some databases are too large for the execution and transfer of your backup file to complete within the 24-hour period limit of a one-off dyno. For these databases, you can retrieve your Postgres connection string to run the script outside of Heroku. If you’re using a Shield or Private database, you must also use PrivateLink or mTLS to facilitate the external connection.
Capturing Logical Backups for Storage in Another Region
All backups captured by Heroku PGBackups are stored in the U.S. You can perform
pg_dump yourself to capture and store backups in another location. Follow the instructions in the Capturing Logical Backups on Larger Databases section.
Restoring Logical Backups
Be aware that the target database is deleted before the restore process occurs.
heroku pg:backups:restore command runs the
pg_restore application and takes any dump file that is accessible to a dyno running on Heroku and restores it to your Heroku Postgres database.
pg_restore uses a Postgres connection to load a dump into a Postgres database. The restore creates the necessary schema, loads data through
COPY commands, adds constraints, and creates indexes and triggers dumped from the source database. These operations can be slow as each requires disk I/O and computation in order to write, process, and alter the restored data for normal operation.
Certain parts of the restore can run over parallel connections. If you run
pg_restore on a one-off dyno using the
pg_restore command instead of
heroku pg:backups:restore, you can run parallel connections by including the
--jobs flag. Because each job is one process, it’s not recommended to run more than one job per vCPU of your database instance. For example, a Standard-4 database has 4 vCPUs, so run
pg_restore with a maximum of 4 jobs to avoid resource contention and decreased performance.
Direct Transfer of Logical Backups Between Heroku Postgres Databases
heroku pg:copy uses a dedicated dyno to pipe the output of
pg_dump directly to
pg_restore, removing the need to transfer the dump file to a location external to Heroku. The data transfer is also piped through
pv (pipeviewer), to provide more visibility into the transfer. See more info on
pg:copy in the Upgrading Heroku Postgres Databases article.
pg:copy method requires approximately 3 minutes of app downtime per GB of your database. For large databases, it’s recommended that you follow the instructions in Capturing Logical Backups on Larger Databases instead.
Creating indexes, adding constraints, and other checks at the end of the
pg:copy run aren’t be reflected in
pv‘s logging. Although a transfer can appear to hang, it’s working in the background.