Postgres Write-Ahead Log Usage
Last updated May 16, 2022
Table of Contents
Heroku Postgres utilises write-ahead logging as part of Continuous Protection, Heroku Postgres utilises continuous WAL archival to external, reliable storage.
This article covers how WAL is used, issues that can arise when the rate of WAL generation is greater than the rate of WAL archival, and strategies to avoid generating too much WAL.
What is Write-ahead Logging?
Write-ahead logging (WAL) is a core part of enabling Postgres’ durability and data consistency guarantees. All changes are written to this append-only log first, then propagated to the data files on disk.
There are exceptions to this, such as temporary and unlogged tables, where changes are not written to the WAL first, making them non-crash safe and non-replicatable.
Heroku Postgres persists WAL files to local disk first. If the WAL capacity fills completely, the database shuts down and will be at risk of data loss.
Generally speaking, Postgres generates WAL when performing write operations (e.g. INSERT
, UPDATE
etc). WAL capacity runs low when the rate of WAL generation exceeds the rate of WAL archival off-disk, or very high database load results in lower throughput for the archiver.
What can I do?
Monitor WAL Capacity
In the Heroku Postgres Metrics Logs Heroku emits a sample#wal-percentage-used
metric. A healthy database will have this at 0.75
or lower. Above 0.75
Heroku will automatically limit connections to the Heroku Postgres instance.
When connections are limited, Heroku emits an additional log line, with the following structure:
source=DATABASE_URL addon=postgresql-rugged-12345 sample#wal-percentage-used=0.88 sample#max-connections=120 message=Database WAL usage is too high, throttling connections.
Some Heroku Add-on partners who provide monitoring can help with graphing and alerting on the sample#wal-percentage-used
metric and the appearance of the additional log line.
Reduce WAL Generation
WAL generation can be reduced by limiting the rate of writes to Heroku Postgres. Some strategies are listed below. Which strategies are used are dependent on the specific workload, so Heroku is unable to provide detailed instructions for all strategies.
Use Postgres partitioning to split up larger tables
For any database with larger tables, using Postgres native partitioning to split up large tables into smaller ones can help significantly. This allows you to remove and archive data by manipulating whole partitions rather than subsets of tables, which vastly decreases the amount of WAL produced.
Use DROP
and TRUNCATE
for deleting data in bulk
For datasets where a range, such as “the last N days/weeks/months”, is kept and periodically deleted, use table partitioning and partition by time range, then use Postgres DROP
or TRUNCATE
to remove partitions. DROP TABLE
and TRUNCATE
generate a very small amount of WAL that is not proportional to the amount of data removed, whereas DELETE
will generate an amount of WAL proportional to the amount of data removed.
Do not delete and then reinsert the same data
A pattern Heroku has seen in Heroku Postgres customers who regularly exhaust WAL capacity is continually deleting a large proportion of their data, then reimporting it from an external source. This generates a very large amount of WAL, both for the delete and the re-insert. Instead prefer only updating or inserting rows that have actually changed.
Batch large volume writes
Heroku strongly recommends the use COPY
and other mechanisms to bulk insert data, over using INSERT
directly. Manipulating one row per query causes an amplification of the WAL produced.
The Postgres documentation on populating as database has more information on using COPY
for bulk inserts, as well as some other best practices for bulk data imports. Not all of the advice is applicable to Heroku Postgres.
Disable triggers causing write amplification
Postgres triggers are a powerful tool, but they can lead to write amplification, where a write to one table can cause many writes to other ones. This is not a good pattern if you need to update a large amount of data. It may be preferable to disable triggers when performing data loading/large amounts of writes. This can be done via ALTER TABLE <table> DISABLE TRIGGER <trigger name>
for a specific trigger, or ALTER TABLE <table> DISABLE TRIGGER all
for all triggers.
Do not use a Postgres as an object store
Postgres is a transactional relational database, designed for on-line transactional processing of data – it is not designed to be an object store. As a result, storing binary data and very large amounts of text/JSON/JSONB in a single column is an anti-pattern.
Use temporary or unlogged tables for data loading
If loading a very large amount of data, it may be helpful to “stage” the data in tables that are not written to WAL. This has some risks, as these are not crash safe tables, but can be a useful tool as part of an Extract-Transform-Load process.
You can create temporary tables using CREATE TEMPORARY TABLE
, and unlogged tables with CREATE UNLOGGED TABLE
.
Ensure follower parity
Heroku Postgres followers can be up 2 plans lower than the leader. A mismatch of plan sizes increases the chances of the follower being unable to keep up with WAL playback.