Increasing Performance of Large Tables on Heroku Postgres Using Partitioning

Table of Contents [expand]

Overview
Setup
Limitations and Gotchas

Last updated February 15, 2023

As applications and their associated data volumes grow, a few tables within a Postgres database grow at a rate that’s much faster than the other tables. This can cause a variety of problems for applications. Query times within the application start to rise, bulk loads take longer, and creating indexes can take a long time. One way to solve this problem is by partitioning what is logically one large table into smaller physical pieces.

Use table partitioning as a last resort. See the article, Heroku Postgres Database Tuning before attempting to partition a table.

Overview

In Postgres, partitioning tables require that a master table exists in which all child tables inherit. You must create each child table manually and don’t create any extra columns outside of what was inherited from the master table. The only caveat is that indexes must be created on each of the child tables. A series of check constraints are created to make sure that data that is inserted goes to the correct partition.

The main use case for partitioning a table typically involves taking a table that tracks data that is created over a time series. While other use cases do exist, the process articulated in the document can be followed for other types of tables.

pg_partman

pg_partman is an extension to create and manage both time-based and serial-based table partition sets. Child table & trigger function creation is all managed by the extension itself. Tables with existing data can also have their data partitioned in easily managed smaller batches. An optional retention policy can automatically drop partitions no longer needed. While partition management is handled by pg_partman, an external scheduling service to Postgres is needed to invoke the management of the partitions on an appropriate interval.

The example works through the partitioning of an events table.

Setup

Load the extension
Before a master table can be partitioned, the extension must be added to the database:
```
$ heroku pg:psql -a sushi
sushi::DATABASE=> CREATE EXTENSION pg_partman;
```

Define your master table
In most cases, the master table has already been defined because data growth for that table has been tremendous. For example, the events could be defined as:

$ heroku pg:psql -a sushi
sushi::DATABASE=> \d events
Column     |  Type                    | Modifiers
------------+--------------------------+-----------
id         | integer                  |
name       | text                     |
type       | text                     |
created_at | timestamp with time zone | not null

Determine the time interval to partition over
This interval depends on the velocity of the data being created. As a rule of thumb, partitions aren’t smaller than an hour. Typical configurations for the partition window are hourly, daily, or monthly. pg_partman has many different options for defining the partition window.
Create the initial partitions
Within the psql console, you must use the interval you’ve decided for partitioning and the retention period
```
$ heroku pg:psql -a sushi
sushi::DATABASE=> SELECT create_parent('public.events', 'created_at', 'time', 'daily');
```
After invoking this command, pg_partman creates a number of control tables and associated data to manage the child tables.
Add maintenance scripts to your project
Because pg_partman can’t automatically partition tables and manage the triggers associated with them, an external scheduling service like Heroku Scheduler or a clock process is needed to manage the partitions. A regular maintenance task must run on the database to make sure that old partitions are dropped and new ones created. This is an example command that runs against the database using Heroku Scheduler:
```
psql $DATABASE_URL -c 'select run_maintenance();'
```
Depending on the time interval used for table partitioning, Heroku Scheduler is set up to run on an hourly or daily basis.

This setup process is enough to get started with table partitioning in Postgres. pg_partman can do much more in the way of configuration.

For more advanced configuration of partitioning, see the pg_partman documentation on Github.

Limitations and Gotchas

While partitioning tables in Postgres can help manage the velocity of data that’s being generated by the application, some limitations do exist.

The length of the child table name can only be a maximum of 63 characters long. If the child table names that get created are longer than the maximum, they get truncated and could lead to unexpected results.
In the event that the job that maintains the number of child tables stops for any extended period and child tables aren’t getting created, INSERT statements start getting rejected.
The example in this documentation and the code to manage partitions is only for tables that must be partitioned by time series data.
If you get the ERROR: permission denied for schema heroku_ext message after running create_parent, follow these instructions.

Categories

Increasing Performance of Large Tables on Heroku Postgres Using Partitioning

Table of Contents [expand]

Overview

pg_partman

Setup

Limitations and Gotchas

Feedback