Skip Navigation
Show nav
Heroku Dev Center
  • Get Started
  • Documentation
  • Changelog
  • Search
  • Get Started
    • Node.js
    • Ruby on Rails
    • Ruby
    • Python
    • Java
    • PHP
    • Go
    • Scala
    • Clojure
  • Documentation
  • Changelog
  • More
    Additional Resources
    • Home
    • Elements
    • Products
    • Pricing
    • Careers
    • Help
    • Status
    • Events
    • Podcasts
    • Compliance Center
    Heroku Blog

    Heroku Blog

    Find out what's new with Heroku on our blog.

    Visit Blog
  • Log inorSign up
View categories

Categories

  • Heroku Architecture
    • Dynos (app containers)
    • Stacks (operating system images)
    • Networking & DNS
    • Platform Policies
    • Platform Principles
  • Command Line
  • Deployment
    • Deploying with Git
    • Deploying with Docker
    • Deployment Integrations
  • Continuous Delivery
    • Continuous Integration
  • Language Support
    • Node.js
    • Ruby
      • Working with Bundler
      • Rails Support
    • Python
      • Background Jobs in Python
      • Working with Django
    • Java
      • Working with Maven
      • Java Database Operations
      • Java Advanced Topics
      • Working with Spring Boot
    • PHP
    • Go
      • Go Dependency Management
    • Scala
    • Clojure
  • Databases & Data Management
    • Heroku Postgres
      • Postgres Basics
      • Postgres Getting Started
      • Postgres Performance
      • Postgres Data Transfer & Preservation
      • Postgres Availability
      • Postgres Special Topics
    • Heroku Data For Redis
    • Apache Kafka on Heroku
    • Other Data Stores
  • Monitoring & Metrics
    • Logging
  • App Performance
  • Add-ons
    • All Add-ons
  • Collaboration
  • Security
    • App Security
    • Identities & Authentication
    • Compliance
  • Heroku Enterprise
    • Private Spaces
      • Infrastructure Networking
    • Enterprise Accounts
    • Enterprise Teams
    • Heroku Connect (Salesforce sync)
      • Heroku Connect Administration
      • Heroku Connect Reference
      • Heroku Connect Troubleshooting
    • Single Sign-on (SSO)
  • Patterns & Best Practices
  • Extending Heroku
    • Platform API
    • App Webhooks
    • Heroku Labs
    • Building Add-ons
      • Add-on Development Tasks
      • Add-on APIs
      • Add-on Guidelines & Requirements
    • Building CLI Plugins
    • Developing Buildpacks
    • Dev Center
  • Accounts & Billing
  • Troubleshooting & Support
  • Integrating with Salesforce
  • Patterns & Best Practices
  • Reviewing Your Key Application Performance Metrics

Reviewing Your Key Application Performance Metrics

Last updated May 27, 2020

This is a draft article - the text and URL may change in the future. This article is unlisted. Only those with the link can access it.

Table of Contents

  • Introduction
  • Pre-requisite monitoring setup
  • Start a review document
  • Record your resources and configuration
  • Use the Production Check feature
  • Record your errors and events
  • Record your response times and throughput
  • Identify slow transactions
  • Record your dyno load and memory usage
  • Run pg:diagnose
  • Record your Postgres load
  • Record your number of connections
  • Record your cache hit rates and IOPs
  • Record your database size
  • Identify your expensive queries
  • Record language-specific or other key metrics
  • (Optional) Compare to previous review
  • Next steps

Introduction

In this tutorial, you will:

  • Review and identify patterns in your baseline metrics
  • Identify items to investigate for remediation or optimization

These checks will reveal potential performance bottlenecks. The Next Steps section includes guidance for resolving the issues you find.

Pre-requisite monitoring setup

While the Metrics tab of your Heroku Dashboard and data.heroku.com provide a picture of the overall health of your application and database, additional tools are required to get a more complete metrics review.

Install these tools at least 7 days before using this tutorial so you have enough data to identify issues.

  • An app deployed to Heroku on non-free dynos that is receiving traffic. Application metrics are not available for free plans.
  • A non-hobby-tier Heroku Postgres database. Postgres metrics are not available for hobby plans.
  • A logging add-on, to view app and database logs from the test
  • An application performance monitoring (APM) add-on, such as New Relic, Scout, or AppOptics, to identify slow endpoints and services
  • An infrastructure monitoring add-on, such as Librato or AppSignal, to measure load on the app and database
  • log-runtime-metrics enabled
  • (Optional) language runtime metrics enabled

This tutorial includes screenshots from a variety of monitoring tools. Use the tools that work best for your app.

Start a review document

Start a document to capture your observations.

Set your Heroku Dashboard Metrics tab and all monitoring tools to the same timezone and units of measure (i.e. rpm vs. rps) for easier reference.

Set your monitoring tools to look at the same time period, i.e. the last 7 days of history. Confirm that your selected time period is typical for your app. Note your selected in your review document.

At the bottom of your review document, add a section called “Items to Investigate.” You will add items to this section as you record observations about your metrics. Later, you will dive deeper into the items you have flagged for investigation.

Record your resources and configuration

In this step, you will record relevant configuration info for your resources. This allows you to interpret your metrics in the context of your app’s current configuration.

Record the following info, adjusting for the specifics of your app:

Resource Version/Plan (examples shown) Config (examples shown)
Stack heroku-18
Region U.S. Common Runtime
Web Dynos Performance-M 1-4 dynos (autoscaling enabled - p95 = 800ms threshold)
Web Server Puma 4.3 WEB_CURRENCY = 2
Framework Rails 5.2.3 RAILS_MAX_THREADS = 5, pool (from database.yml): 5
Database Postgres 11.6: Standard-0 Attached as DATABASE_URL,HEROKU_POSTGRESQL_SILVER_URL
Other Other resources and add-ons, i.e. monitoring tools, worker dynos, Heroku Scheduler, etc

Use the Production Check feature

Use the Production Check feature on your Heroku Dashboard. Take a screenshot of it and include it in your review document. Add any warnings to your list of “Items to Investigate.”

Production Check screenshot

Record your errors and events

In this step, you will gather the error and event info concerning your app. These include events such as app deploys, dyno formation changes, etc. This provides additional context as you interpret your metrics.

In your Heroku Dashboard, go to your Metrics tab and scroll to the Events section. Take a screenshot and add that to your review document.

Errors and Events screenshot

If your monitoring tool includes an error analytics feature, also record that info in your review document.

Write a description for your screenshot(s). Take a note of the following:

  • When you deployed and a description of what was deployed, i.e. link to the merged pull request.
  • Changes to your dyno formation so that you know how many dynos you were running throughout your observation timeframe
  • The pattern of your daily dyno restarts. If any daily restarts occur outside of low traffic periods, add an item to your “Items to Investigate” section at the end of your review document.
  • Frequency and type of errors. Add these errors to your “Items to Investigate” list.
  • Any incidents that occurred, along with links from status.heroku.com that detail the incidents. You may find that these incidents account for anomalies you may encounter as you review your metrics.

Record your response times and throughput

From your Heroku Dashboard Metrics tab or your monitoring tool, take a screenshot of your throughput and response time and add them to your review document. Throughput screenshot Response time screenshot

On the Heroku Dashboard Metrics tab, you can select and de-select what is shown by clicking on the legend to the right of the graph. For example, just the p50 response times for the example application are shown below: p50 response time screenshot

Response times for your web application should be under 500ms.

The following signs point to potential concurrency issues:

  • Response times increase when throughput increases
  • Response times increase for a sustained period after a deploy or scaling event

If you notice these in the metrics graphs, use your monitoring tool to examine request queuing time. High queue times indicate there is slow code or insufficiency in web concurrency or the number of running processes.

High queue time indicates that your app is unable to handle the volume of requests, causing those requests to back up in the router.

This example graph from Scout APM that includes queue time in pink is shown below: Response time with queue time screenshot in Scout APM

The above screenshot shows elevated queue times. Sometimes the higher queue times are accompanied by higher throughput, indicating insufficient concurrency or dyno count for the amount of traffic.

Make some observations about any patterns you see. Record the following in your review document. Example info is shown for reference:

Metric Suggested Baseline Your Baseline Comments
Avg p50 response time < 500 ms 135.4 ms Response times are higher from Friday through midday Monday. Fewer dynos were run during this time than other times during the week.
Avg p95 response time 500 ms 418.9 ms
Avg p99 response time 1000 ms 1087 ms There are some large spikes in p99 response times. These appear to match up to timestamps for H12 Request Timeout errors.
Avg throughput — 1422 rpm
Max throughput — 2248 rpm Queue time jumps up whenever throughput is above 2000 rpm, though higher queue times do not always match higher throughput.

In the next couple of steps, you will identify some slow transactions and look at your web dyno utilization and concurrency.

Identify slow transactions

A monitoring tool with transaction tracing capabilities is invaluable for identifying slow transactions.

If your p95/p99 response times are slow, or you are experiencing H12 errors, you should examine your transactions that have the slowest average response times.

Please consult your monitoring tool’s documentation for steps on identifying these transactions. Here are the transactions listed as the slowest average response time for the example application, as shown in New Relic:

Slowest average response times in New Relic APM

Add these transactions to add to the “Items to Investigate” list in your review document.

Record your dyno load and memory usage

In this step, you will take screenshots and make observations about your dyno load and memory.

Determining the correct number and type of dynos and the most effective web concurrency is a non-trivial task. You will first start by looking at your web dyno load and memory to check if your dynos are currently over- or underutilized.

In your Heroku Dashboard Metrics tab or your monitoring tool, take screenshots of these two metrics: Dyno load screenshot Dyno memory screenshot

The best practices guidance for max dyno load depends on what dyno type you use. Please consult this Knowledge Base article to find the recommended load for your dyno type. If you experience high load issues, you may want to investigate if it is possible to move CPU-heavy tasks into background jobs instead.

The recommendation for memory usage is to keep max memory usage below 85% of your memory quota. You can see memory limits for your dyno type here.

As you review the app’s memory behavior, some red flags are any swap usage, R14, and R15 errors. Total memory usage includes RAM and swap memory. It is reasonable to observe some swap (below 50 MB on Common Runtime), but if swap is significant, you may need to lower your web concurrency settings and add more dynos or increase dyno size. Note that Private Spaces dynos do not swap, but will instead restart the dyno.

Record your observations, along with comments on any patterns you see. A table with example info is shown below (Performance-M dynos are used in the example):

Metric Suggested Baseline Your Baseline Comments
Avg dyno load < 3.0 3.17 Load is frequently above the recommended baseline. Slightly lower load experienced over the weekend through midday Monday.
Max dyno load 3.0 7.24 Max load is more than double the recommended baseline.
Avg memory usage < 85% 27.8% Average and max memory usage are virtually the same except for a few spikes in max memory. Memory is generally underutilized.
Max memory usage 85% 195.1% Memory spikes match the timestamps for R14 errors.
Max memory swap < 50 MB 2443 MB
Web concurrency settings WEB_CURRENCY = 2, RAILS_MAX_THREADS = 5

If your load or memory appears to be under- or over-utilized, add an item to your “Items to Investigate” section to look into optimizing dyno usage and web concurrency.

High memory usage can also be indicative of other memory issues such as memory leakage or bloat. You may want to add diagnosing memory issues to your “Items to Investigate.”

Different languages, web frameworks, and applications behave quite differently, making it difficult to offer specific advice. When you review your “Items to Investigate” in the last step of this tutorial, links are provided to help you with adjusting web concurrency and troubleshooting memory issues for a variety of languages and frameworks.

Run pg:diagnose

In this step, you will start looking at your database performance.

Your database is an important resource to monitor. pg:diagnose performs a number of useful health and diagnostic checks that help analyze and optimize database performance.

Run pg:diagnose and screenshot the output for your review document. Add anything that is flagged in red or yellow to your “Items to Investigate” list.

pg:diagnose output

For example, from the output above, you would add the indexes listed as yellow as items to investigate further. Your investigation would include:

  • confirming that the Never Used Indexes are also not used on any followers you may have
  • removing the Low Scans, High Writes indexes in your staging environment to determine impact before applying changes to your production database

For more ideas of what items to investigate, please take a look at the pg:diagnose section of this article and review anything related to the red and yellow warnings in your output.

Record your Postgres load

In this step, you will record your Postgres load and make some observations..

A variety of metrics are made available in the logs. Although there are no graphs for these Postgres metrics in the Heroku Dashboard, a small selection of these metrics are available as graphs at data.heroku.com. Some monitoring tools do include additional graphs for this info. If your monitoring tool does not offer you a way to monitor Postgres, you may look into downloading logs from your logging add-on provider and using a tool like pgBadger to generate reports from them.

Take a screenshot of your database load. The following is an example from Librato: Database load

A load average of 1.0 indicates that, on average, processes were requesting CPU resources for 100% of the timespan. This number includes I/O wait.

If your load is higher than 1, add to your “Items to Investigate” to see if workloads can be reduced, ensure you are managing your sessions correctly and look into connection pooling to reduce the overhead associated with connections. You can check the number of vCPUs available for your plan here.

Record your number of connections

Take screenshots of the number of your active and waiting connections. The following screenshots are from Librato: Active connections screenshot Waiting connections screenshot

Although max limits for connections are listed on this table, the actual max number of connections that can be made to your database server depends on other factors. Each connection costs more overhead and if your database server is already experiencing high load, it may not be able to reach the hard max connections limit.

A way to keep connection overhead down is to use connection pooling, which will reuse connections. If the number of your connections is approaching your connection limit or your load looks concerning, add an item to look into connection pooling to your “Items to Investigate” list.

Waiting connections are those waiting on database locks to proceed. Some lock waits are to be expected, but high numbers indicate an issue. Since Heroku metrics are only collected once every minute, consistent numbers seen for this metric can indicate a problem. If the number of waiting connections is greater than 0 for more than 5 minutes, add an item for examining database locking contention to your “Items to Investigate”.

Record your cache hit rates and IOPs

Take screenshots of your cache hit rates and IOPs from your monitoring tools. The following screenshots are from AppSignal. Cache hit rates screenshot in AppSignal APM IOPs screenshot in AppSignal APM

Your cache hit rates should be above 0.99 or 99%. A value of 100% would indicate perfect cache utilization, with anything less than that indicating cache misses. However, cache misses do not necessarily mean that disk I/O was needed as Postgres also relies heavily on the file system’s memory cache.

Heroku Postgres instances allocate up to 8GB to the shared_buffers cache, using the formula minimum_of(0.25 * Available RAM, 8GB). As such, the read-iops metric is a better indicator of whether or not the amount of RAM provided by your current Postgres plan is sufficient to cache all of your regularly accessed data.

The read-iops and write-iops metrics track how many read and write I/O requests are made to the main database disk partition in values of IOPS (I/O Operations Per Second). Each Heroku PostgreSQL plan has a max Provisioned IOPS (PIOPS) configuration. PIOPS includes the total reads + writes per second that can be sustained by the provisioned disk volume. Ideally, you want your database reads to come from memory (cache) rather than disk, which is much slower.

If you have poor cache hit rates and consistent high read-iops, it is time to upgrade to a larger Heroku Postgres plan to increase your cache size.

If you have high IOPS, you can tune your queries and indexes to reduce the amount of data being read by queries. If they are already optimized, and you are consistently going past your PIOPS limits, it may be time to upgrade to a plan with a higher PIOPS limit.

Record your database size

Your database size includes all table and index data on disk, including database bloat. You will want to monitor this to ensure you are not approaching plan limits. Take a screenshot of your database size. An example from AppSignal is included below: Database size screenshot in AppSignal APM

If performance starts suffering as the database grows, check for bloat periodically using heroku pg:bloat. Anything larger than bloat factor of 10 is worth investigating, however larger tables may have a low bloat factor but still take up a lot of space in bloat and require their vacuum thresholds to be adjusted from the default. Few monitoring tools offer graphs for bloat, but monitoring for database size may at least remind you to check for bloat when necessary. If you have a large database, add checking pg:bloat to your list of “Items to Investigate”.

Identify your expensive queries

In this step, you will identify queries for optimization.

You can view expensive queries for your database at data.heroku.com. Select a database from the list and navigate to its Diagnose tab. List these queries in your “Items to Investigate” section. You will need to look through your logs to find the exact parameters passed to these queries when you are investigating.

The pg:outliers command from the Heroku pg-extras plug-in is also helpful for finding slow queries. Run that command to find queries that have a high proportion of execution time and record these in your “Items to Investigate” list.

Record language-specific or other key metrics

If you have language runtime metrics enabled or other metrics to monitor such as Heroku Redis, also take screenshots of those to add to your document. This tutorial does not detail what key metrics are for other resources. Please refer to each resource’s documentation page in Dev Center or elsewhere for guidance on key things to monitor.

(Optional) Compare to previous review

Note: if this is your first review, you will not have a previous review to compare with.

If you have previously completed a metrics review, you may want to compare your observations, make notes and come up with more action items.

For example, you may notice that although your database size is well within the limits of your plan, it has grown substantially since the review done the previous quarter. Your action items may include investigating if this growth is expected to continue so that you can better plan for future scalability.

Next steps

Interpreting metrics is more of an art than a science. When a metric appears high, you must consider your metrics in concert rather than isolation. As you completed each of the previous steps, you should have been filling out your “Items to Investigate” section. Take some time to summarize these items and add details about their next steps.

While your next steps will vary depending on what is required for your app, the following links may help you as you start your deeper investigation.

  • Determining the correct number and type of dynos: see Optimizing Dyno Usage
  • Concurrency: Understanding Concurrency plus specific articles on concurrency for Puma, Node, Gunicorn and PHP
  • Memory Issues: Basic methodology for optimizing memory, plus specific articles for Ruby, Node, Java, PHP, Go and Tuning glibc Memory Behavior
  • Slow response times:
    • Request Timeout
    • Check your monitoring provider’s documentation on using transaction tracing in your tool. You can look at transaction traces in your monitoring tool to see if the slowness is caused by rendering, a call to an external API, a slow database query, etc.
    • You may also want to rewrite your computationally intensive tasks as background jobs in order to keep request times short
  • Slow queries: Expensive Queries, Efficient Use of PostgreSQL Indexes
  • Database connection pooling: Client-side Postgres Connection Pooling, Concurrency and Database Connections in Ruby with ActiveRecord and Concurrency and Database Connections in Django.
  • Database locking: pg:locks
  • Database bloat: Managing VACUUM on Postgres

Heroku Enterprise customers can contact the Customer Solutions Architecture (CSA) team for help with interpreting your metrics or determining next steps. Many Heroku Enterprise customers are also eligible for deeper engagements with the CSA team such as App Assessments which include a review of your metrics and a full report of recommendations. An example App Assessment can be found here. To determine eligibility or request assistance with interpreting your metrics or determining next steps, please use the “Ask a CSA” link in our Enterprise Portal.

Keep reading

  • Patterns & Best Practices

Feedback

Log in to submit feedback.

Information & Support

  • Getting Started
  • Documentation
  • Changelog
  • Compliance Center
  • Training & Education
  • Blog
  • Podcasts
  • Support Channels
  • Status

Language Reference

  • Node.js
  • Ruby
  • Java
  • PHP
  • Python
  • Go
  • Scala
  • Clojure

Other Resources

  • Careers
  • Elements
  • Products
  • Pricing

Subscribe to our monthly newsletter

Your email address:

  • RSS
    • Dev Center Articles
    • Dev Center Changelog
    • Heroku Blog
    • Heroku News Blog
    • Heroku Engineering Blog
  • Heroku Podcasts
  • Twitter
    • Dev Center Articles
    • Dev Center Changelog
    • Heroku
    • Heroku Status
  • Facebook
  • Instagram
  • Github
  • LinkedIn
  • YouTube
Heroku is acompany

 © Salesforce.com

  • heroku.com
  • Terms of Service
  • Privacy
  • Cookies
  • Cookie Preferences