Increasing Performance of Large Tables on Heroku Postgres Using Partitioning
Last updated January 06, 2023
As applications and their associated data volumes grow, a few tables within a Postgres database grow at a rate that’s much faster than the other tables. This can cause a variety of problems for applications. Query times within the application start to rise, bulk loads take longer, and creating indexes can take a long time. One way to solve this problem is by partitioning what is logically one large table into smaller physical pieces.
Use table partitioning as a last resort. See the article, Heroku Postgres Database Tuning before attempting to partition a table.
In Postgres, partitioning tables require that a master table exists in which all child tables inherit. You must create each child table manually and don’t create any extra columns outside of what was inherited from the master table. The only caveat is that indexes must be created on each of the child tables. A series of check constraints are created to make sure that data that is inserted goes to the correct partition.
The main use case for partitioning a table typically involves taking a table that tracks data that is created over a time series. While other use cases do exist, the process articulated in the document can be followed for other types of tables.
pg_partman is an extension to create and manage both time-based and serial-based table partition sets. Child table & trigger function creation is all managed by the extension itself. Tables with existing data can also have their data partitioned in easily managed smaller batches. An optional retention policy can automatically drop partitions no longer needed. While partition management is handled by
pg_partman, an external scheduling service to Postgres is needed to invoke the management of the partitions on an appropriate interval.
The example works through the partitioning of an events table.
Load the extension
Before a master table can be partitioned, the extension must be added to the database:
$ heroku pg:psql -a sushi sushi::DATABASE=> CREATE EXTENSION pg_partman;
Define your master table
In most cases, the master table has already been defined because data growth for that table has been tremendous. For example, the events could be defined as:
$ heroku pg:psql -a sushi sushi::DATABASE=> \d events Column | Type | Modifiers ------------+--------------------------+----------- id | integer | name | text | type | text | created_at | timestamp with time zone | not null
Determine the time interval to partition over
This interval depends on the velocity of the data being created. As a rule of thumb, partitions aren’t smaller than an hour. Typical configurations for the partition window are hourly, daily, or monthly.
pg_partmanhas many different options for defining the partition window.
Create the initial partitions
psqlconsole, you must use the interval you’ve decided for partitioning and the retention period
$ heroku pg:psql -a sushi sushi::DATABASE=> SELECT create_parent('public.events', 'created_at', 'time', 'daily');
After invoking this command,
pg_partmancreates a number of control tables and associated data to manage the child tables.
Add maintenance scripts to your project
pg_partmancan’t automatically partition tables and manage the triggers associated with them, an external scheduling service like Heroku Scheduler or a clock process is needed to manage the partitions. A regular maintenance task must run on the database to make sure that old partitions are dropped and new ones created. This is an example command that runs against the database using Heroku Scheduler:
psql $DATABASE_URL -c 'select run_maintenance();'
Depending on the time interval used for table partitioning, Heroku Scheduler is set up to run on an hourly or daily basis.
This setup process is enough to get started with table partitioning in Postgres.
pg_partman can do much more in the way of configuration.
For more advanced configuration of partitioning, see the
pg_partman documentation on Github.
Limitations and Gotchas
While partitioning tables in Postgres can help manage the velocity of data that’s being generated by the application, some limitations do exist.
- The length of the child table name can only be a maximum of 63 characters long. If the child table names that get created are longer than the maximum, they get truncated and could lead to unexpected results.
- In the event that the job that maintains the number of child tables stops for any extended period and child tables aren’t getting created,
INSERTstatements start getting rejected.
- The example in this documentation and the code to manage partitions is only for tables that must be partitioned by time series data.