Scaling Your Dyno Formation
Last updated 30 December 2019
Table of Contents
Heroku apps can be scaled to run on multiple dynos simultaneously (except on Free or Hobby dynos). You can scale your app’s dyno formation up and down manually from the Heroku Dashboard or CLI.
You can also configure Heroku Autoscaling for Performance-tier dynos, and for dynos running in Private Spaces. Threshold autoscaling adds or removes web dynos from your app automatically based on current request latency.
Dynos are prorated to the second, so if you want to experiment with different scale configurations, you can do so and only be billed for actual seconds used. Remember, it’s your responsibility to set the correct number of dynos and workers for your app.
Before you can scale your app’s dyno formation, make sure your app is running on Professional-tier dynos:
- From the Heroku Dashboard, select the app you want to scale from your apps list.
- Navigate to the
- Above your list of dynos, click
Change Dyno Type.
- Select the
Professional (Standard/Performance)dyno type.
Scaling from the Dashboard
To scale the number of dynos for a particular process type:
- Select the app you want to scale from your apps list.
- Navigate to the
- In the app’s list of dynos, click the
Editbutton (looks like a pencil) next to the process type you want to scale.
- Drag the slider to the number of dynos you want to scale to.
To change the dyno type used for a particular process type:
- Click the hexagon icon next to the process type you want to modify.
- Choose a new dyno type from the drop-down menu (Standard-1X, Standard-2X, Performance-M, or Performance-L).
Scaling from the CLI
Scaling the number of dynos
You scale your dyno formation from the Heroku CLI with the
$ heroku ps:scale web=2 Scaling dynos... done, now running web at 2:Standard-1X
The command above scales an app’s
web process type to 2 dynos.
You can scale multiple process types with a single command, like so:
$ heroku ps:scale web=2 worker=1 Scaling dynos... done, now running web at 2:Standard-1X, worker at 1:Standard-1X
You can specify a dyno quantity as an absolute number (like the examples above), or as an amount to add or subtract from the current number of dynos, like so:
$ heroku ps:scale web+2 Scaling dynos... done, now running web at 4:Standard-1X.
If you want to stop running a particular process type entirely, simply scale it to
$ heroku ps:scale worker=0 Scaling dynos... done, now running web at 0:Standard-1X.
Changing dyno types
To move a process type from
standard-1x dynos up to
standard-2x dynos for increased memory and CPU share:
$ heroku ps:scale web=2:standard-2x Scaling dynos... done, now running web at 2:Standard-2X.
Note that when changing dyno types, you must still specify a dyno quantity (such as
2 in the example above).
Moving back down to
standard-1x dynos works the same way:
$ heroku ps:scale web=2:standard-1x Scaling dynos... done, now running web at 2:Standard-1X
See the documentation on dyno types for more information on dyno types and their characteristics.
Autoscaling is currently available only for Performance-tier dynos and dynos running in Private Spaces. Heroku’s auto-scaling uses response time which relies on your application to have very small variance in response time. If your application does not, then you may want to consider a third-party add-on such as Rails Auto Scale which scales based on queuing time instead of overall response time.
Autoscaling lets you scale your web dyno quantity up and down automatically based on one or more application performance characteristics.
A small number of customers have observed a race condition when using Heroku web autoscaling in conjunction with a third-party worker autoscaling utility like HireFire. This race condition results in the unexpected disabling of Heroku’s autoscaling. To prevent this scenario, do not use these two autoscaling tools in conjunction.
Autoscaling is configured from your app’s
Resources tab on the Heroku Dashboard:
Enable Autoscaling button next to your web dyno details. The Autoscaling configuration options appear:
Use the slider or text boxes to specify your app’s allowed autoscaling range. The cost range associated with the specified range is shown directly below the slider. Your dyno count is never scaled to a quantity outside the range you specify. Note that the minimum dyno limit cannot be less than
Next, set your app’s Desired p95 Response Time. The autoscaling engine uses this value to determine how to scale your dyno count (see below). A recommended p95 response time is provided.
Email Notifications if you’d like all app collaborators (or team members if you’re using Heroku Teams) to be notified when your web dyno count reaches the range’s upper limit. At most one notification email is sent per day.
When you’ve configured autoscaling to your liking, click
Confirm. Note that if your new autoscaling dyno range setting does not include the process type’s current dyno count, the current dyno count is immediately adjusted to conform to the range.
The dyno manager uses your app’s Desired p95 Response Time to determine when to scale your app. The autoscaling algorithm uses data from the past hour to calculate the minimum number of web dynos required to achieve the desired response time for 95% of incoming requests at your current request throughput.
The autoscaling algorithm does not include WebSocket traffic in its calculations.
Every time an autoscaling event occurs, a single web dyno is either added or removed from your app. Autoscaling events always occur at least 1 minute apart.
The autoscaling algorithm scales down less aggressively than it scales up. This protects against a situation where substantial downscaling from a temporary lull in requests results in high latency if demand subsequently spikes upward.
If your app experiences no request throughput for 3 minutes, its web dynos scale down at 1-minute intervals until throughput resumes.
Sometimes, slow requests are caused by downstream bottlenecks, not web resources. In these cases, scaling up the number of web dynos can have a minimal (or even negative) impact on latency. To address these scenarios, autoscaling will cease if the percentage of failed requests is 20% or more. You can monitor the failed request metric using Threshold Alerting.
Monitoring autoscaling events
From the Heroku Dashboard
Autoscaling events appear alongside manual scale events in the Events chart. In event details they are currently identified as having been initiated by “Dyno Autoscaling”. In addition, enabling, disabling and changes to autoscaling are shown. If a series of autoscaling events occur in a time interval rollup, only the step where the scaling changed direction is shown. For example, in the Events chart below “Scaled up to 2 of Performance-M” is an intermediate step to the peak of 3 Performance-M dynos, and is not shown.
You can subscribe to webhook notifications that are sent whenever your app’s dyno formation changes. Webhook notifications related to dyno formation have the
Read App Webhooks for more information on subscribing to webhook notifications.
Disable autoscaling by clicking the
Disable Autoscaling button on your app’s
Resources tab. Then, specify a fixed web dyno count and click
Manually scaling through the CLI, or otherwise making a call to
ps:scale via the API to instruct it to manually scale (e.g. via a third-party autoscaling tool) will disable autoscaling.
Known issues & limitations
As with any autoscaling utility, there are certain application health scenarios for which autoscaling might not help. You might also need to tune your Postgres connection pool, worker count, or add-on plan(s) to accommodate changes in web dyno formation. The mechanism to throttle autoscaling based on a request throughput error rate of 20% or more was designed for the scenario where the bottleneck occurs in downstream components. Please see Understanding concurrency for additional details.
We strongly recommend that you simulate the production experience with load testing, and use Threshold Alerting in conjunction with autoscaling to monitor your app’s end-user experience. Please refer to our Load Testing Guidelines for Heroku Support notification requirements.
Different dyno types have different limits to which they can be scaled. See Dyno Types to learn about the scaling limits.
The term dyno formation refers to the layout of your app’s dynos at a given time. The default formation for simple apps will be a single web dyno, whereas more demanding applications may consist of web, worker, clock, etc… process types. In the examples above, the formation was first changed to two web dynos, then two web dynos and a worker.
The scale command affects only process types named in the command. For example, if the app already has a dyno formation of two web dynos, and you run
heroku ps:scale worker=2, you will now have a total of four dynos (two web, two worker).
The current dyno formation can been seen by using the
heroku ps command:
$ heroku ps === web (Free): `bundle exec unicorn -p $PORT -c ./config/unicorn.rb` web.1: up for 8h web.2: up for 3m === worker (Free): `bundle exec stalk worker.rb` worker.1: up for 1m
The Unix watch utility can be very handy in combination with the
ps command. Run
watch heroku ps in one terminal while you add or remove dynos, deploy, or restart your app.
Any changes to the dyno formation are logged:
$ heroku logs | grep Scale 2011-05-30T22:19:43+00:00 heroku[api]: Scale to web=2, worker=1 by email@example.com
Note that the logged message includes the full dyno formation, not just dynos mentioned in the scale command.
Singleton process types, such as a clock/scheduler process type or a process type to consume the Twitter streaming API, should never be scaled beyond a single dyno. These process types don’t benefit from additional concurrency and in fact they will create duplicate records or events in your system as each tries to do the same work at the same time.
Scaling up a process type provides additional concurrency for performing the work handled by that process type. For example, adding more web dynos allows you to handle more concurrent HTTP requests, and therefore higher volumes of traffic. Adding more worker dynos lets you process more jobs in parallel, and therefore a higher total volume of jobs.
Importantly, however, there are scenarios where adding dynos to to a process type won’t immediately improve app performance:
Backing service bottlenecks
Sometimes, your app’s performance is limited by a bottleneck created by a backing service, most commonly the database. If your database is currently a performance bottleneck, adding more dynos might only make the problem worse. Instead, try some or all of the following:
- Optimize your database queries
- Upgrade to a larger database
- Implement caching to reduce database load
- Switch to a sharded configuration, or scale reads using followers
Dyno concurrency doesn’t help with large, monolothic HTTP requests, such as a report with a database query that takes 30 seconds, or a job to email out your newsletter to 20,000 subscribers. Concurrency gives you horizontal scale, which means it applies best to work that is easily subdivided.
The solution to the slow report might be to move the report’s calculation into the background and cache the results in memcache for later display.
For the long job, the answer is to subdivide the work: create a single job that in turn puts 20,000 jobs (one for each newsletter to send) onto the queue. A single worker can consume all these jobs in sequence, or you can scale up to multiple workers to consume these jobs more quickly. The more workers you add, the more quickly the entire batch will finish.