Heroku Postgres Data Safety and Continuous Protection
Last updated January 28, 2021
Table of Contents
Postgres offers a number of ways to replicate, backup and export your data. Some are extremely lightweight and have little performance impact on a running database, while others are rather expensive and may adversely impact a database under load.
This article explores how Heroku Postgres uses these methods to provide continuous protection, as well as data portability and how you can choose what is appropriate for your data.
The different types of backups
The types of backups available for Postgres are broadly divided into physical and logical backups. Physical backups may be snapshots of the file-system, a binary copy of the database cluster files or a replicated external system, while logical backups are a SQL-like dump of the schema and data of certain objects within the database. Physical backups offer some of least computationally intensive methods of data durability available while being very limited in how they may be restored. Logical backups are much more flexible, but can be very slow and require substantial computational resources during backup and restore. How these methods are used within Heroku Postgres is explored below.
Physical backups on Heroku Postgres
Heroku Postgres uses physical backups for continuous protection by persisting incremental snapshots or base backups of the file system, and write ahead log (WAL) files to external, reliable storage.
Snapshots are taken on most databases while the database is fully available and make a verbatim copy of the instance’s disk. This includes dead tuples, bloat, indexes and all structural characteristics of the currently running database. The rate at which we capture snapshots is dynamic. For average or low change databases, we try to capture a snapshot at least every 24 hours. For databases that change more frequently, we capture them more often.
Base backups are still used in some cases, for example with Postgres version 9.5 or if a database has exceeded capacity. They are taken while the database is fully available and make a verbatim copy of Postgres’ data files. This includes dead tuples, bloat, indexes and all structural characteristics of the currently running database. On Heroku Postgres, a base backup capture is rate limited to about 10 MB/s and imposes a minimal load on the running database.
Committed transactions are recorded as WAL files, which are able to be replayed on top of the snapshots or base backups, providing a method of completely reconstructing the state of a database. Snapshots are stored directly in AWS’s S3 object store. Base backups and WAL files are pushed to’ S3 through an application called WAL-E as soon as they are made available by Postgres.
All databases managed by Heroku Postgres provide continuous protection by persisting snapshots, base backups and WAL files to S3. Also, fork and follower databases are implemented by fetching snapshots or persistent base backups and WAL files and replaying them on a fresh Postgres installation. Storing these physical backups in a highly available object store also enables us to recover entire databases in the event of hardware failure, data corruption or a large scale service interruption.
All Heroku Postgres databases are protected through continuous physical backups. These backups can be retrieved through Heroku Postgres Rollbacks on Standard and Premium tier databases. However, Hobby tier databases do not offer rollbacks, forks or followers.
Due to the nature of these snapshots, binary base backups and WAL files, they are only able to be restored to Postgres installations with the same architecture, major version and build options as the source database. This means that upgrades across architectures and major versions of Postgres require a logical backup to complete.
Logical backups on Heroku Postgres
Logical backups are an extremely flexible method of exporting data from Heroku Postgres and restoring it to almost any database installation, but can be quite painful for moving large data sets.
Frequent logical backups of large databases under load can be slow, degrade the performance of other queries and prevent necessary maintenance operations from running.
Logical backups are captured using pg_dump and can be restored with pg_restore or psql, depending on the dump file format. Script format dumps are plain-text files of SQL commands that would reconstruct the database at the time of capture and are restored by piping them through psql. Archive file formats are more flexible and performant than script formats and the “custom” format allows for compression as well as opportunistic reordering of all archived items. Archive file formats are restored through pg_restore.
pg_dump application uses a regular Postgres connection to run a series of SQL
COPY statements in a single transaction to produce a consistent snapshot across the database. While
pg_dump is running it can prevent other operations which require an exclusive lock (such as schema changes), and may interfere with VACUUM and other automatic maintenance processes. This is because long running transactions will prevent autovacuum from freezing rows that are visible to those transactions.
pg_dump uses a single backend to read all live tuples in the database through a circular buffer that preserves Postgres’ cache but will displace and potentially ruin the OS’s and filesystem’s caches. The
pg_dump backend will consume as much filesystem I/O as available and will contend for resources with concurrently running queries. The file that is produced by
pg_dump will be much smaller than the size of the database as reported by
pg_database_size as it will only contain the live dataset of the database and instructions on how to re-make indexes, but not the indexes themselves. Also, if a ‘custom’ archive file format is used, it will be compressed by default.
pg_restore application similarly uses a regular Postgres connection to load a dump into a Postgres database. The restore will create the necessary schema, load data through COPY commands, create indexes, add constraints and create any triggers that were dumped from the source database. The loading data, creating indexes and creating constraints phases may be extremely slow as each requires disk I/O and computation in order to write, process and alter the restored data for normal operation. Certain parts of the restore may be run over several parallel connections, which is accomplished through the
--jobs=number-of-jobs command line flag to
pg_restore. As each job is one process, more than one job per CPU core in the database instance will lead to resource contention and may lead to decreased performance.
pg_restore running on Heroku are available through Heroku Postgres. During a scheduled run of PGBackups or manual run of
heroku pg:backups:capture, PGBackups will launch a dedicated dyno to take a dump of your database and upload it to S3. A
pg:backups capture run may be canceled through a normal
$ heroku pg:psql => SELECT pid, query FROM pg_stat_activity WHERE query LIKE 'COPY%'; pid | query -------+---------------------------------------------------- 21374 | COPY public.users (id, created_at, name) TO stdout; (1 row) => SELECT pg_cancel_backend(21374); pg_cancel_backend ------------------- t (1 row)
As for restoring data to your database,
heroku pg:backups:restore will take any dump file that is accessible to a dyno running on Heroku, such as one available on S3, and restore it to your Heroku Postgres database. Be aware that the target database will be expunged before the restore process occurs.
For directly transferring a logical backup between Heroku Postgres databases,
heroku pg:copy will use a dedicated dyno to pipe the output of
pg_dump directly to
pg_restore, removing the need to transfer the dump file to a location external to Heroku. The data transfer is also piped through
pv, an application better known as pipeviewer, to provide more visibility into the transfer. However, creating indexes, adding constraints and doing basic sanity checking at the end of the
pg:copy run will not be reflected in
pv‘s logging, so the transfer may appear to hang, but is working in the background.
The performance impact of logical backups
Capturing a logical backup from a database will subject it to increased load. As all data is read, a logical backup capture will evict well cached data from non-Postgres caches, consume finite disk I/O capacity, pause auto-maintenance tasks and degrade the performance of other queries. As the size of a database and/or the load the database is under grows, this period of degraded performance and neglected auto-maintenance will also grow. Balancing your database’s performance with your backup requirements is necessary in order to avoid unpleasant surprises.
In general, PGBackups are intended for moderately loaded databases up to 20 GB. Contention for the I/O, memory and CPU needed for backing up a larger database becomes prohibitive at a moderate load and the longer run time increases the chance of an error that will end your backup capture prematurely.
Combining physical and logical backups
Fork and followers may be employed to reduce the load of a logical backup. Capturing a logical backup from a follower will preserve the leader’s performance and allow the backup operation to consume as many resources as necessary for the dump to succeed. Launching a short lived fork will also allow for logical backups to be taken without affecting your primary database’s performance, although the initial base backup of a new fork will compete with the logical backup for disk I/O. More information on capturing logical backups can be found above, and in Heroku Postgres Backups.
PGBackups are able to capture automatic backups of the database at any of the defined database config vars like
DATABASE_URL. This allows you to tune where and when your logical backup should be taken.