Increasing Performance of Large Tables on Heroku Postgres Using Partitioning
Last updated 12 September 2017
As applications and their associated data volumes grow, a few tables within a Postgres database will grow at a rate that’s much faster than the other tables. This can cause a variety of problems for applications. Query times within the application will start to rise, bulk loads will take longer, and creating indexes can take a long time. One way to solve this problem is by partitioning what is logically one big table into smaller physical pieces.
Partitioning a table should be used as a last resort. Please see the article, Heroku Postgres Database Tuning before attempting to partition a table.
In Postgres, partitioning tables requires that a master table exist in which all child tables will inherit. Each child table has to be created manually and no extra columns should be created outside of what was inherited from the master table. The only caveat is that indexes have to be created on each of the child tables. A series of check constraints are created to make sure that data that is inserted goes to the correct partition.
The main use case for partitioning a table typically involves taking a table that tracks data that is created over a time series. While other use cases do exist, the process articulated in the document can be followed for other types of tables.
pg_partman is an extension to create and manage both time-based and serial-based table partition sets. Child table & trigger function creation is all managed by the extension itself. Tables with existing data can also have their data partitioned in easily managed smaller batches. Optional retention policy can automatically drop partitions no longer needed. While partition management is handled by pg_partman, an external scheduling service to Postgres is needed to invoke the management of the partitions on an appropriate interval.
The example will work through the partitioning of an events table.
Load the extension
Before a master table can be partitioned, the extension needs to be added to the database:
$ heroku pg:psql -a sushi sushi::DATABASE=> CREATE EXTENSION pg_partman;
Define your master table
In most cases, the master table has already been defined because data growth for that table has been tremendous. For example, an events could be defined as such:
$ heroku pg:psql -a sushi sushi::DATABASE=> \d events Column | Type | Modifiers ------------+--------------------------+----------- id | integer | name | text | type | text | created_at | timestamp with time zone | not null
Determine the time interval to partition over
This depends on the velocity of the data being created. As a rule of thumb, partitions should not be smaller than an hour. Typical configurations for the partition window are hourly, daily or monthly. pg_partman has many different options for defining the partition window.
Create the initial partitions
psqlconsole, you’ll need to use the interval you’ve decided for partitioning and the retention period
$ heroku pg:psql -a sushi sushi::DATABASE=> SELECT create_parent('public.events', 'created_at', 'time', 'daily');
After invoking this command, pg_partman will create a number of control tables and associated data to manage the child tables.
Add maintenance scripts to your project
Because pg_partman cannot automatically partition tables and manage the triggers associated with them, an external scheduling service like Heroku Scheduler or a clock process is needed to manage the partitions. A regular maintenance task needs to run on the database to make sure that old partitions are dropped and new ones created. This is an example command that will run against the database using Heroku Scheduler:
heroku run 'psql $DATABASE_URL -c "select run_maintenance();"'
Depending on the time interval used for table partitioning, Heroku Scheduler should be set up to run on an hourly or daily basis.
This setup process is enough to get started with table partitioning in Postgres. pg_partman is capable of much more in the way of configuration.
For more advanced configuration of partitioning, please see the pg_partman documentation on Github.
Limitations and Gotchas
While partitioning tables in Postgres can help manage the velocity of data that’s being generated by the application, some limitations do exist.
- The length of the child table name can only be a maximum of 63 characters long. Should the child table names that get created be longer than that, they will get truncated and could lead to unexpected results.
- In the event that the job that maintains the number of child tables stops for any extended period of time and child tables are not getting created,
INSERTstatements will start getting rejected.
- The example in this documentation and the code to manage partitions is only for tables that need to be partitioned by time series data.