Understanding Heroku Postgres Data Caching
Last updated August 01, 2023
Data caching in Postgres can’t be preallocated or guaranteed. Instead it can only be estimated and can vary widely depending on your workload. Heroku Postgres plans have a certain amount of System RAM, much of which is used for caching, but users can see slightly better or worse caching in their databases. A well-designed application serves more than 99% of queries from cache. This article provides an overview about how Postgres caches.
How Does PostgreSQL Cache Data?
While Postgres does have a few settings that directly affect memory allocation for the purposes of caching, most of the cache that Postgres uses is provided by the underlying operating system. Postgres, unlike most other database systems, makes aggressive use of the operating system’s page cache for a large number of operations.
As an example, say we provision a server with 7.5 GB of total system memory. Of these 7.5 GB, some small portion is used by the Operating System kernel, some smaller part of it’s used for other programs, including Postgres. The rest, measured to be between 80% and 95% of that system memory, is caching of data by the operating system.
We’ve observed that the memory footprint of a Heroku Postgres instance’s operating system and other running programs is 500 MB on average, and costs are mostly fixed regardless of plan size.
There are a few distinct ways in which Postgres allocates this bulk of memory, and the majority of it is typically left for the operating system to manage. Postgres manages a “Shared Buffer Cache”, which it allocates and uses internally to keep data and indexes in memory. This allocation is configured to be about 25% of total system memory for a server running a dedicated Postgres instance, such as all Heroku Postgres instances. The rest of available memory is used by Postgres for two purposes: to cache your data and indexes on disk via the operating system page cache, and for internal operations or data structures.
Data that has been recently written to or read from disk passes through the operating system page cache and is therefore cached in memory. In doing so, reads are served from cache, leading to reduced block device I/O operations and consequently higher throughput. However, there are a few Postgres operations that also use this memory, thus invalidating the cache.
The most noteworthy here are certain kinds of internal operations done to fulfill queries such as internal in-memory quicksorts and hash tables or group by operations. Furthermore every join in a query can use a certain amount of memory dictated by a Postgres configuration setting called
work_mem. Each of these operations has the potential to use memory that would otherwise be used for data and index caching. However, this is also a form of caching in the sense that it avoids having to read the same information from disk to do their work.
Beyond this, there are other operations that require memory, such as running VACUUM by the autovacuum daemon (or yourself) as well as all DDL operations. In the DDL category, it’s worth mentioning the creation of indexes, which tend to consume large amounts of memory. Thus also using up memory available for data and index caches, albeit temporarily.
Heroku Postgres plans vary primarily by the amount of System RAM available. The best way to understand what plan is best for your workload is to try them. More information can be found on this article.
What Does Having a Cold Cache Mean?
If for some reason you experience a service disruption on your production tier Heroku Postgres database, you can receive a message that when your database comes back online you have a “cold cache”. When you see this message, there’s underlying hardware affected and as a result, your database comes back online on a new host where no data have been cached. See this Knowledge Base article to mitigate cold cache issues on a new Postgres instance.
If you periodically send reads to your follower the cache can already be warmed, by that reducing the time for your cache to be performing at normal levels.
If you have a follower, you can promote it when you see this message instead of waiting for your database to become available on the new host.