Optimizing Dyno Usage
Last updated July 23, 2024
Table of Contents
A fundamental aspect to optimizing any application is to ensure it is architected appropriately. For example, it should use background jobs for computationally intensive tasks in order to keep request times short, and use a process model to ensure that separate parts of the application can be scaled independently.
Beyond this, you may reach a point where you need to scale or optimize by making more efficient use of available resources. For example, if your web requests are short and handled efficiently, you could be able to increase throughput on a dyno by increasing the ability of the web server to handle more requests concurrently, usually at the expense of using more RAM.
This article provides a bird’s-eye view of how to go about optimizing an application for the various dyno types. It provides some rough estimates of capabilities, and pays particular attention to memory usage and concurrency. The techniques suggested in this article are relevant to any environment that runs your application, not just a dyno. For specific guidance on minimizing your costs for different environments, see Optimizing Resource Costs.
Heroku Enterprise customers with Premier or Signature Success Plans can request in-depth guidance on this topic from the Customer Solutions Architecture (CSA) team. Learn more about Expert Coaching Sessions here or contact your Salesforce account executive.
Considering different dyno types
Heroku offers a range of dyno types. Each type has a different CPU and RAM profile.
Changing the dyno type of an application increases complexity: as a developer you have introduced a new variable, the type of the dyno, in addition to the number of dynos.
However, a well designed app will quite naturally be able to make use of different dyno types, and thinking about optimizing your application to make better use of a dyno is a worthwhile endeavor.
Even if your application doesn’t need to make use of different dyno types, consider applying these optimization techniques to your current dyno type anyway.
The different dyno types offer three important axes of optimization: CPU, RAM and the performance profile.
CPU
Most applications are not CPU-bound on the web server.
If you are processing individual requests slowly due to CPU or other shared resource constraints (such as database), then optimizing concurrency on the dyno may not help your application’s throughput at all. Put another way, if your application is slow when there is little traffic, the techniques in this article may not increase performance.
The different dyno types do offer different CPU performance characteristics, and will aid a little in a high-CPU situations, but ideally you should consider offloading work to a background worker as a first step in optimization, as well as optimizing the code.
A final aspect of CPU is the number of cores. The different dyno types, performance
in particular, offer multiple cores. With multiple cores, you may be able to execute multiple threads in parallel. This article points out where you need to take action to make use of these cores.
The rest of this article will assume the application is not CPU-bound.
RAM
Depending on language and web framework, there is typically a direct correspondence between RAM and concurrency.
For example, web servers like Unicorn for Ruby, or Gunicorn for Python, pre-fork a number of identical copies of your web servers (called workers). Unicorn then has its own connection queue, and as workers finish a web request, they pull a new request off of the queue.
Having more RAM in this scenario means that you can have more workers running concurrently - and there is typically a fairly linear correlation between RAM and concurrency. Optimizing concurrency for RAM is something this article addresses.
Performance profile
The performance profile of each dyno type can have an impact. In particular, eco
, basic
, standard-1x
and standard-2x
dynos operate on a CPU-share basis, whereas performance
dynos are single tenant.
These performance
dynos therefore offer a higher level of resource isolation.
This can have a significant impact on applications, depending on the amount of traffic that they’re receiving and how well they’re optimized. In particular, a more consistent performance profile can lead to reduced tail latencies.
When to try a different dyno size
There are many factors that come into play when considering different dyno types. Some of them are inherent to your application (how much CPU does it use), some are due to optimization factors introduced by increased concurrency (due to having more RAM) and some due to the inherent characteristics of the dyno itself.
This complexity can be difficult to navigate, but the simple techniques suggested in this article for applications that are not CPU bound can be found make it a lot more tractable and easy to optimize for any dyno type.
Once you have optimized for a particular dyno type, say standard-1x
dynos, apply the same techniques on a standard-2x
, performance-m
or performance-l
dyno - taking into account the factors that each dyno type introduce.
Here are some rough rules of thumb:
- For most applications that aren’t receiving tremendously high volumes of traffic, consider
standard-1x
dynos. - If the application is particularly memory-hungry, as seen in some Java-based frameworks such as Play and JRuby, consider
standard-2x
dynos which doubles the memory. - For very high volume web apps, running on more than 20
standard-1x
dynos, considerperformance-m
orperformance-l
dynos.
Basic methodology for optimizing memory
We suggest that you follow these steps, making use of visibility tools listed below, as well as the per-language suggestions. This will get you to a point where you can easily optimize for a single dyno type, or for moving between dyno types.
- Use a concurrent web server.
- Set up instrumentation to measure the impact of load on the app.
- Observe the app’s performance, and adjust the concurrency as necessary.
Optimizing is an iterative process - there is no golden path. Different languages, web frameworks and applications behave quite differently.
For example, a standard Ruby application may need to use a web server that forks multiple copies of an application to make use of all the RAM that is available. A standard Java application, on the other hand, may simply need a parameter to the JVM in order to allocate a larger heap.
Threads
A single dyno can serve thousands of requests per second, but performance depends greatly on the language and framework you use.
A single-threaded, non-concurrent web framework can process one request at a time. For an app that takes 100ms on average to process each request, this translates to about 10 requests per second per dyno, which is not optimal.
Single threaded backends are not recommended for production applications because of their inefficient handling of concurrent requests. Choose a concurrent backend whenever developing and running a production service.
Multi-threaded or event-driven environments like Java, Unicorn, EventMachine, and Node.js can handle many concurrent requests. Load testing your app is the only realistic way to determine request throughput.
Concurrent web servers
Different languages and platforms have different approaches to concurrency. Here’s a brief look at how to establish concurrency in apps running on Ruby, Java, Python and Node.js.
Ruby
To see how you can optimize your application please refer to the comprehensive R14 - Memory Quota Exceeded in Ruby (MRI) article. It covers common problems for memory bloat in a Ruby application as well as several diagnostic tools and techniques for finding and correcting increased memory use in a Ruby application. Concurrency and Database Connections in Ruby with ActiveRecord is a great resource for evaluating how to factor in best practices for database connections to maximize concurrency, too.
JRuby
JRuby servers like Puma make good use of concurrency without the need for multiple processes. However, you will need to tune the amount of memory allocated to the JVM, depending on the dyno type. The Ruby buildpack defines sensible defaults, which can be overridden by setting either JAVA_OPTS
or JRUBY_OPTS
.
Java, Scala, Clojure
Java web servers like Jetty, Tomcat and Netty make good use of concurrency out of the box. However, you will need to tune the amount of memory allocated to the JVM, depending on the dyno type.
Read Adjusting Environment for a Dyno Size for appropriate JAVA_OPTS
flags to accomplish this.
Python
For Python apps, we recommend that you use either Gunicorn or Uvicorn, which are performant Python HTTP servers that will automatically use concurrency on Heroku.
Read Optimizing Python Application Concurrency for more information on tuning Python applications for maximum throughput.
Node.js
Node offers a single-threaded, non-blocking process model. To take advantage of multiple cores, Node must use the Cluster API to fork multiple concurrent processes. Even if you don’t plan on using concurrency today, we recommend enabling Cluster in your app so that it can scale to a variety of containers.
Read Optimizing Node.js Concurrency to learn how to configure concurrency through Node’s Cluster API on Heroku.
PHP
Applications using the PHP or HHVM runtimes automatically adjust their number of worker processes or threads depending on the type of dyno they run on. The main factor to decide the number of processes or threads is the PHP memory limit that’s configured for an application.
Please refer to Optimizing PHP Application Concurrency for more information on tuning PHP applications for maximum throughput.
Measuring
After setting up a concurrent web server, you’ll want to tune it for a particular dyno type. Measuring memory and throughput should provide enough guidance for you to make a judgement as to the impact of a change.
Measuring memory with log-runtime-metrics
The Heroku Labs log-runtime-metrics feature adds support for enabling visibility into load and memory usage for running dynos.
Per-dyno stats on memory use, swap use, and load average are inserted into the app’s log stream.
Here is some example output with this feature enabled:
source=web.1 dyno=heroku.2808254.d97d0ea7-cf3d-411b-b453-d2943a50b456 sample#load_avg_1m=2.46 sample#load_avg_5m=1.06 sample#load_avg_15m=0.99
source=web.1 dyno=heroku.2808254.d97d0ea7-cf3d-411b-b453-d2943a50b456 sample#memory_total=21.00MB sample#memory_rss=21.22MB sample#memory_cache=0.00MB sample#memory_swap=0.00MB sample#memory_pgpgin=348836pages sample#memory_pgpgout=343403pages
The memory_rss
is the most significant number here, providing an indication of total resident memory. Ensure that you don’t exceed the memory of your dyno type - and leave some head room too. Likewise, make sure you keep swap usage at minimum and the swapping activity (memory_pgpgin
/memory_pgpgout
) is minimal. Ideally memory_pgpgin
/memory_pgpgout
shouldn’t change much over time (rate of change is zero).
See log-runtime-metrics to understand how to interpret these figures.
The output of log-runtime-metrics is particularly useful as it lets you look at per-dyno memory usage. If you’re over-provisioned, you may see a single dyno peaking before any other.
There are other ways of visualizing this memory data:
The Librato add-on, with the Nickel plan and above, provides a way to graph the various output from log-runtime-metrics, averaging the values across all the dynos.
Here is sample output for a Rails application on standard-1x
dynos using 4 Unicorn workers. The memory, about 359MB at a peak, fits comfortably into the standard-1x
512MB of RAM.
Measuring throughput and response time
Throughput, the number of requests being handled per minute, as well as response times, are particularly useful indicators of how an optimization has affected the performance of a dyno.
In particular, the 95th and 99th percentile response time values provided by add-ons like Librato or New Relic should be monitored closely.