Heroku Postgres Data Safety and Continuous Protection
Last updated December 15, 2022
Heroku Postgres uses physical backups for continuous protection by persisting incremental snapshots or base backups of the file system, and write ahead log (WAL) files to external, reliable storage. This article explores how Heroku Postgres performs physical backups.
Physical Backups on Heroku Postgres
Snapshots are taken on most databases while the database is fully available and makes a verbatim copy of the instance’s disk. This includes dead tuples, bloat, indexes, and all structural characteristics of the currently running database. The rate at which we capture snapshots is dynamic. For average or low change databases, we try to capture a snapshot at least every 24 hours. For databases that change more frequently, we capture them more often.
Base backups are still used in some cases, for example with Postgres version 9.5 or if a database has exceeded capacity. They’re taken while the database is fully available and makes a verbatim copy of Postgres’ data files. This includes dead tuples, bloat, indexes, and all structural characteristics of the currently running database. On Heroku Postgres, a base backup capture is rate limited to about 10 MB/s and imposes a minimal load on the running database.
Committed transactions are recorded as WAL files, which are able to be replayed on top of the snapshots or base backups, providing a method of completely reconstructing the state of a database. Snapshots are stored directly in AWS’s S3 object store. Base backups and WAL files are pushed to S3 through an application called WAL-E as soon as they’re made available by Postgres.
All databases managed by Heroku Postgres provide continuous protection by persisting snapshots, base backups, and WAL files to S3. Also, fork and follower databases are implemented by fetching snapshots or persistent base backups and WAL files and replaying them on a fresh Postgres installation. Storing these physical backups in a highly available object store also enables us to recover entire databases in the event of hardware failure, data corruption, or a large-scale service interruption.
All Heroku Postgres databases are protected through continuous physical backups. These backups are stored in the same region as the database and retrieved through Heroku Postgres Rollbacks on Standard-tier or higher databases. However, Essential-tier databases don’t offer rollbacks, forks, or followers.
Due to the nature of these snapshots, binary base backups and WAL files are only able to be restored to Postgres installations with the same architecture, major version, and build options as the source database. This means that upgrades across architectures and major versions of Postgres require a logical backup to complete.
Physical vs. Logical Backups
The types of backups available for Postgres are broadly divided into physical and logical backups. Physical backups on Heroku Postgres are a verbatim copy, while logical backups are a SQL-like dump of the schema and data of certain objects within the database.
While physical backups are useful for full disaster recovery and offer some of the least computationally intensive methods of data durability available, they’re limited in how they can be restored. Logical backups are more flexible, but can be very slow and require substantial computational resources during backup and restore.
While all Heroku Postgres databases are protected through continuous physical backups, you can optionally choose to capture logical backups as well for greater data portability. Logical backups are more flexible for testing, setting up staging environments, and migrating your data. See Heroku Postgres Logical Backups for more info.