This article was contributed by the Customer Solutions Architecture team
Part of Heroku Developer Support, CSA helps Enterprise customers optimize and scale their applications on Heroku.
Monitoring Heroku Postgres
Last updated 19 October 2020
Table of Contents
Automated monitoring for your Heroku Postgres database can provide insight into the performance of your app and database. Use monitoring tools to spot problems in advance, respond to incidents quickly, and identify the root cause of issues. This article describes how to effectively set up monitoring for your database, including which tools to use and what metrics to look for.
Set up metrics monitoring with a platform monitoring tool
Heroku Postgres provides database and server metrics that give information on the status of your instance. Understanding and monitoring these metrics will ensure the health of your database as your applications run and scale.
The most effective way to keep track of database metrics is to install a platform monitoring add-on. Librato provides ready-made graphs and the ability to set custom alert thresholds for specific metrics. A full list of monitoring add-ons can be found on the Heroku Elements Marketplace.
Platform monitoring add-ons also provide insights on your apps and dynos. For details, see Monitoring Your App on Heroku.
For a full list of provided metrics, see Heroku Postgres Metrics Logs. The following metrics should have alerts set up to provide meaningful alarms pertaining to database health:
db_size metric is the size of the database, including all table and index data, as well as database bloat.
If your Postgres database grows past the size allotted for your plan, you will receive a warning email with directions on how fix the issue. In some cases, we will set an enforcement date when you will be allowed only a single database connection and access will be restricted to
DELETE , and
TRUNCATE access until the database in back under the plan limit.
Set alerts based on your plans limit to give you some advanced notice if you start approaching it. We recommend setting a warning alert when your database gets to 80% of the allotted size for your plan and a critical alert when it gets to 90% of the allotted size.
If you get close to maximum size, either upgrade your plan or delete data to stay within plan limits.
active-connections metric lists the number of established connections to your database.
Heroku Postgres enforces a connection limit in order to optimize Postgres memory usage and scaling.
For Heroku Postgres plans
tier-3 and higher there is a hard limit of 500 connections. Once that limit is reached no new connections can be created.
There are two strategies for setting up alerts based on database connection counts:
- Set up an alert for sudden, large changes to the current connection count. Big changes from baseline connection numbers could be a sign of increased query and/or transaction run times. Alerting thresholds will be dependent on your application’s connection count range, assessed under normal operating conditions. Consider +50/+100 over your normal daily maximum.
- Set up an alert for when the connection count approaches its hard maximum. For tier-3 plans and higher, the maximum is 500 connections, so 400 and 450 are good warning and critical numbers to start with here.
If you are routinely approaching your connection limit, consider using connection pooling. Refer to the PgBouncer Configuration best practices document for more information.
waiting-connections metric shows how many connections are waiting on database locks before they can proceed.
Occasional lock waits are expected, but sustained lock waits can be a sign of mishandled database concurrency.
Set up an alert for when there are any connections waiting for five consecutive minutes. You can use the pg-extras CLI plugin to help identify queries that are preventing other operations from taking place.
The blocking queries can then be terminated in order to resolve lock contention. Additionally, knowing what statements cause blocks can help to identify application code that can be optimized to reduce locks.
load-avg (in 1m, 5m, and 15m averages)
load-avg metric shows average CPU load over the indicated periods. Heroku’s reported load metrics are normalized by dividing the system load by the number of CPUs.
A load average of 1.0 over a given time window indicates full utilization of all CPUs, a load over 1.0 indicates that processes had to wait for CPU time in the given window (with higher values indicating more time spent by processes waiting), and values under 1.0 indicate that CPUs spent time idle during the given window. If this value is high, you will get less consistent query execution times and longer wait times.
Since values over 1.0 indicate over-utilization, you will want to know before the load gets to that number. Set up alerts for when this
load-avg reaches 0.8 (warning) and 0.9 (critical).
Check current activity with the
pg:ps command for cpu-intensive queries. Additionally check IOPS, as exceeding provisioned IOPS will cause processes to need to wait on I/O to become available before they can process. If you are consistently seeing high values for
load-avg, then it may be time to upgrade to a larger Heroku Postgres plan. Before doing that, or if you’re already on the largest plan, look to tuning expensive queries to reduce the amount of processing work done on the database server and/or data read from disk.
read-iops metric tracks how many read IO requests are made to the main database disk partition in values of IOPS (IO Operations Per Second). Each Heroku Postgres plan has a provided Provisioned IOPS (PIOPS) max, the maximum total reads + writes per second that can be sustained by the provisioned disk volume.
Ideally, you want your database’s reads to come from memory (cache) rather than disk as disk reads are much slower than memory. If provisioned IOPS is exceeded, processes need to wait on I/O to become available before they can process, leading to long transaction times and high
Set up an alert for 90% of your Provisioned IOPS (PIOPS)to help identify activities or statements that require significant I/O so that they can be addressed.
If you are consistently seeing high values for
read-iops, then it may be time to upgrade to a larger Heroku Postgres plan. Before doing that, or if you’re already on the largest plan, look to tuning expensive queries to reduce the amount of data being read directly from disk.
wal-percentage-used metric tracks the space left to store temporary Postgres write-ahead logs. Heroku Postgres utilizes Postgres write-ahead logging as a part of Continuous Protection. Any database change is written to the write-ahead logs prior to the database data files.
If the rate of WAL generation exceeds the rate of WAL archival, there is risk of completely filling up the WAL volume. This will shut down the database and there will be a risk of data loss.
Heroku will automatically begin throttling database connection limits once the volume reaches 75% utilization, eventually terminating all connections once it reaches 95% utilization. Set up an alert for when this number reaches 60% to be notified before connection limits are invoked.
Please refer to Postgres Write-Ahead Log Usage for details and remediation strategies.
Install logging tool
To supplement the database system metrics, Heroku Postgres logs messages to an app’s log stream. Heroku does not store logs for a long period of time, so you’ll need an external tool to keep log history. Set up a logging tool by installing a logging add-on.
During an incident, logs can help investigate issues and identify long-running queries. After an incident, they can be used for root cause analysis.
The Logging category includes several logging tools, with different features and retention periods.
Events from Heroku Postgres logs can be isolated by filtering for the
postgres process. For examples and more information, refer to the article Understanding Heroku Postgres Log Statements and Common Errors. It is important to be familiar with the logs so that you have an understanding of what sorts of messages are logged routinely and what messages are abnormal for your application. Logs are an invaluable tool for finding long running queries or locks.
For more logging use cases and setup tips, see Monitoring Your App on Heroku.
Postgres automatically collects statistical information on queries and tables that can be used to analyze performance issues. The Heroku pg-extras CLI plugin provides shortcuts for interpreting those statistics.
Routinely reviewing database statistics helps identify potential issues before they become problems.
The pg-extras CLI plugin, setup instructions, and command reference can be found on its GitHub repository.
Routinely review opportunities for improvement, such as unused indexes, large tables and indexes, vacuum statistics, bloat statistics, and outlier queries.
Database monitoring is just one part of overall app monitoring. See guidance for Monitoring Your App on Heroku.