Heroku Postgres Monitoring Best Practices
Last updated May 17, 2023
Table of Contents
Automated monitoring for your Heroku Postgres database can provide insight into the performance of your app and database. Use monitoring tools to spot problems in advance, respond to incidents quickly, and identify the root cause of issues. This article describes how to effectively set up monitoring for your database, including which tools to use and what metrics to look for.
Heroku Enterprise customers with Premier or Signature Success Plans can request in-depth guidance on this topic from the Customer Solutions Architecture (CSA) team. Learn more about Expert Coaching Sessions here or contact your Salesforce account executive.
Set Up Metrics Monitoring with a Platform Monitoring Tool
Heroku Postgres provides database and server metrics that give information on the status of your instance. Understanding and monitoring these metrics ensure the health of your database as your applications run and scale.
The most effective way to track database metrics is to install a platform monitoring add-on. A full list of monitoring add-ons can be found on the Heroku Elements Marketplace.
Platform monitoring add-ons also provide insights on your apps and dynos. For details, see Monitoring Your App on Heroku.
For a full list of provided metrics, see Heroku Postgres Metrics Logs. The following metrics have alerts set up to provide meaningful alarms pertaining to database health:
db_size
The db_size
metric is the size of the database, including all table and index data, as well as database bloat.
If your Postgres database grows past the size allotted for your plan, you receive a warning email with directions on how to fix the issue. In some cases, we set an enforcement date when you’re allowed only a single database connection and access is restricted to READ
, DELETE
, and TRUNCATE
access until the database in back under the plan limit.
Set alerts based on your plans limit to give you some advanced notice if you start approaching it. We recommend setting a warning alert when your database gets to 80% of the allotted size for your plan and a critical alert when it gets to 90% of the allotted size.
If you get close to maximum size, either upgrade your plan or delete data to stay within plan limits.
active-connections
The active-connections
metric lists the number of established connections to your database.
Heroku Postgres enforces a connection limit in order to optimize Postgres memory usage and scaling.
For Heroku Postgres plans tier-3
and higher there’s a hard limit of 500 connections. After that limit is reached no new connections can be created.
There are two strategies for setting up alerts based on database connection counts:
- Set up an alert for sudden, large changes to the current connection count. Significant changes from baseline connection numbers could be a sign of increased query and/or transaction run times. Alerting thresholds are dependent on your application’s connection count range, assessed under normal operating conditions. Consider +50/+100 over your normal daily maximum.
- Set up an alert for when the connection count approaches its hard maximum. For tier-3 plans and higher, the maximum is 500 connections, so 400 and 450 are good warning and critical numbers to start with here.
If you’re routinely approaching your connection limit, consider using connection pooling. Refer to the PgBouncer Configuration best practices document for more information.
waiting-connections
The waiting-connections
metric shows how many connections are waiting on database locks before they can proceed.
Occasional lock waits are expected, but sustained lock waits can be a sign of mishandled database concurrency.
Set up an alert for when there are any connections waiting for five consecutive minutes. You can use the pg-extras CLI plugin to help identify queries that are preventing other operations from taking place.
The blocking queries can then be terminated in order to resolve lock contention. Additionally, knowing what statements cause blocks can help to identify application code that can be optimized to reduce locks.
load-avg (in 1m, 5m, and 15m averages)
The load-avg
metric shows average CPU load over the indicated periods. Heroku’s reported load metrics are normalized by dividing the system load by the number of CPUs.
A load average of 1.0 over a given time window indicates full utilization of all CPUs. A load over 1.0 indicates that processes had to wait for CPU time in the given window (with higher values indicating more time spent by processes waiting). Values under 1.0 indicate that CPUs spent time idle during the given window. If this value is high, you can get less consistent query execution times and longer wait times.
Since values over 1.0 indicate over-utilization, you want to know before the load gets to that number. Set up alerts for when this load-avg
reaches 0.8 (warning) and 0.9 (critical).
Check current activity with the pg:ps
command for cpu-intensive queries. Additionally check IOPS, as exceeding provisioned IOPS causes processes to wait on I/O to become available before they can process. If you’re consistently seeing high values for load-avg
, then it’s time to upgrade to a larger Heroku Postgres plan. Before doing that, or if you’re already on the largest plan, look to tuning expensive queries to reduce the amount of processing work done on the database server and/or data read from disk.
read-iops
The read-iops
metric tracks how many read IO requests are made to the main database disk partition in values of IOPS (IO Operations Per Second). Each Heroku Postgres plan has a provided Provisioned IOPS (PIOPS) max, the maximum total reads + writes per second that can be sustained by the provisioned disk volume.
Ideally, you want your database’s reads to come from memory (cache) rather than disk as disk reads are much slower than memory. If provisioned IOPS is exceeded, processes must wait on I/O to become available before they can process, leading to long transaction times and high load-avg
.
Set up an alert for 90% of your Provisioned IOPS (PIOPS) to help identify activities or statements that require significant I/O so that they can be addressed.
If you’re consistently seeing high values for read-iops
, then it’s time to upgrade to a larger Heroku Postgres plan. Before doing that, or if you’re already on the largest plan, look to tuning expensive queries to reduce the amount of data being read directly from disk.
wal-percentage-used
The wal-percentage-used
metric tracks the space left to store temporary Postgres write-ahead logs. Heroku Postgres uses Postgres write-ahead logging as a part of Continuous Protection. Any database change is written to the write-ahead logs before the database data files.
If the rate of WAL generation exceeds the rate of WAL archival, there’s risk of completely filling up the WAL volume. This shuts down the database and there’s a risk of data loss.
Heroku automatically begins throttling database connection limits when the volume reaches 75% utilization, eventually terminating all connections when it reaches 95% utilization. Set up an alert for when this number reaches 60% to be notified before connection limits are invoked.
Refer to Postgres Write-Ahead Log Usage for details and remediation strategies.
Install Logging Tool
To supplement the database system metrics, Heroku Postgres logs messages to an app’s log stream. Heroku doesn’t store logs for a long time, so you need an external tool to keep log history. Set up a logging tool by installing a logging add-on.
During an incident, logs can help investigate issues and identify long-running queries. After an incident, they can be used for root cause analysis.
The Logging category includes several logging tools, with different features and retention periods.
Events from Heroku Postgres logs can be isolated by filtering for the postgres
process. For examples and more information, refer to the article Understanding Heroku Postgres Log Statements and Common Errors. It’s important to be familiar with the logs so that you have an understanding of what sorts of messages are logged routinely and what messages are abnormal for your application. Logs are an invaluable tool for finding long running queries or locks.
If you have an outside logging tool, you can set up a log drain to that tool. If you use Heroku Shield, you can use Private Space Logging to drain all logs from a Space to an outside tool.
For more logging use cases and setup tips, see Monitoring Your App on Heroku.
Install pg-extras
Postgres automatically collects statistical information on queries and tables that can be used to analyze performance issues. The Heroku pg-extras CLI plugin provides shortcuts for interpreting those statistics.
Routinely reviewing database statistics helps identify potential issues before they become problems.
The pg-extras CLI plugin, setup instructions, and command reference can be found on its GitHub repository.
Routinely review opportunities for improvement, such as unused indexes, large tables and indexes, vacuum statistics, bloat statistics, and outlier queries.
Additionally, the pg:diagnose command, included in the Heroku CLI, runs multiple checks and generates a report showing potential issues based on real-time activity and gathered statistics.
Application Monitoring
Database monitoring is just one part of overall app monitoring. See guidance for Monitoring Your App on Heroku.