Scaling Your Dyno Formation
Last updated 21 June 2017
Table of Contents
Heroku apps running on Professional-tier dynos (any dyno type except Free or Hobby) can be scaled to run on multiple dynos simultaneously. You can scale your app’s dyno formation up and down manually from the Heroku Dashboard or CLI.
You can also configure Heroku Autoscaling for Performance-tier dynos, and for dynos running in Private Spaces. Threshold autoscaling adds or removes web dynos from your app automatically based on current request latency.
Dynos are prorated to the second, so if you want to experiment with different scale configurations, you can do so and only be billed for actual seconds used. Remember, it’s your responsibility to set the correct number of dynos and workers for your app.
Scaling from the Dashboard
To scale your dyno formation from the Dashboard:
- Select the app you want to scale from your apps list.
- Navigate to the Resources tab.
- In your list of dynos, click the Edit button (looks like a pencil) next to the process type you want to scale.
- Drag the slider to the number of dynos you want to scale to.
- Click Confirm.
Scaling from the CLI
You scale your dyno formation from the Heroku CLI with the
$ heroku ps:scale web=2 Scaling dynos... done, now running web at 2:Standard-1X
The command above scales an app’s
web process type to 2 dynos.
You can scale multiple process types with a single command, like so:
$ heroku ps:scale web=2 worker=1 Scaling dynos... done, now running web at 2:Standard-1X, worker at 1:Standard-1X
You can specify a dyno quantity as an absolute number (like the examples above), or as an amount to add or subtract from the current number of dynos, like so:
$ heroku ps:scale web+2 Scaling dynos... done, now running web at 4:Standard-1X.
If you want to stop running a particular process type entirely, simply scale it to
$ heroku ps:scale worker=0 Scaling dynos... done, now running web at 0:Standard-1X.
Scaling the dyno size
In addition to scaling the number of dynos assigned to a process, you can scale the type of dyno. For example, you can scale a process type from
standard-1x dynos up to
standard-2x dynos for increased memory and CPU share:
$ heroku ps:scale web=2:standard-2x Scaling dynos... done, now running web at 2:Standard-2X.
Note that when scaling dyno types, you must still specify a dyno quantity (such as
2 in the example above).
Scaling back down to
standard-1x dynos works the exact same way:
$ heroku ps:scale web=2:standard-1x Scaling dynos... done, now running web at 2:Standard-1X
See the documentation on dyno types for more information on dynos and their characteristics.
Autoscaling is currently available only for Performance-tier dynos and dynos running in Private Spaces.
Autoscaling lets you scale your web dyno quantity up and down automatically based on your app’s current request latency.
Autoscaling is configured from your app’s Resources tab on the Heroku Dashboard:
Click the Enable Autoscaling button next to your web dyno details. The Autoscaling configuration options appear:
Use the slider or text boxes to specify your app’s allowed autoscaling range. The minimum cannot be less than
1 and the maximum cannot be less than your current dyno count. The cost range associated with the specified autoscaling range is shown directly below the slider. Your dyno count is never scaled to a quantity outside the autoscaling range you specify.
Next, set your app’s Desired p95 Response Time. The autoscaling engine uses this value to determine how to scale your dyno count. A recommended p95 response time is provided.
Enable Email Notifications if you’d like all app collaborators (or org members if you’re using Heroku Teams/Organizations) to be notified when your web dyno count reaches the specified upper limit. At most one notification email is sent per day.
The dyno manager uses your app’s Desired p95 Response Time to determine when to scale your app. The autoscaling algorithm uses data from the past hour to calculate the minimum number of web dynos necessary to achieve the desired response time for 95% of incoming requests at your current request throughput. Every time an autoscaling event occurs, a single web dyno is added or removed from your app. Autoscaling events are always at least 1 minute apart. Scaling down was designed to be slightly less responsive to prevent a situation where aggressive downscaling for a brief lull in requests results in request timeouts if the demand subsequently spikes upward. If your app experiences no request throughput for 3 minutes the dynos will start scaling down at 1 minute intervals until the throughput increases.
There are situations where slow requests are due to downstream bottlenecks, not the web resources. In this case scaling up the number of web dynos will either have no impact on or increase latency. To address these scenarios autoscaling will cease if the percentage of failed requests is 20% or more. You can monitor the failed request metric using Threshold Alerting.
Monitoring autoscaling events
Autoscaling events appear alongside manual scale events in the Events chart. In event details they are currently identified as having been initiated by “Dyno Autoscaling”. In addition, enabling, disabling and changes to autoscaling are shown. If a series of autoscaling events occur in a time interval rollup, only the step where the scaling changed direction is shown. For example, in the Events chart below “Scaled up to 2 of Performance-M” is an intermediate step to the peak of 3 Performance-M dynos, and is not shown.
Disable autoscaling by clicking the Disable Autoscaling button on your app’s Resources tab. Then, specify a fixed web dyno count and click Confirm. Note that manually scaling through the CLI, or otherwise making a call to
ps:scale via the API to instruct it to manually scale will disable autoscaling.
Known issues & limitations
As with any autoscaling utility, there are certain application health scenarios for which autoscaling might not help. You might also need to tune your Postgres connection pool, worker count, or add-on plan(s) to accommodate changes in web dyno formation. The mechanism to throttle autoscaling based on a request throughput error rate of 20% or more was designed for the scenario where the bottleneck occurs in downstream components. Please see Understanding concurrency for additional details.
We strongly recommend that you simulate the production experience with load testing, and use Threshold Alerting in conjunction with autoscaling to monitor your app’s end-user experience. Please refer to our Load Testing Guidelines for Heroku Support notification requirements.
Different dyno types have different limits to which they can be scaled. See Dyno Types to learn about the scaling limits.
The term dyno formation refers to the layout of your app’s dynos at a given time. The default formation for simple apps will be a single web dyno, whereas more demanding applications may consist of web, worker, clock, etc… process types. In the examples above, the formation was first changed to two web dynos, then two web dynos and a worker.
The scale command affects only process types named in the command. For example, if the app already has a dyno formation of two web dynos, and you run
heroku ps:scale worker=2, you will now have a total of four dynos (two web, two worker).
The current dyno formation can been seen by using the
heroku ps command:
$ heroku ps === web (Free): `bundle exec unicorn -p $PORT -c ./config/unicorn.rb` web.1: up for 8h web.2: up for 3m === worker (Free): `bundle exec stalk worker.rb` worker.1: up for 1m
The Unix watch utility can be very handy in combination with the
ps command. Run
watch heroku ps in one terminal while you add or remove dynos, deploy, or restart your app.
Any changes to the dyno formation are logged:
$ heroku logs | grep Scale 2011-05-30T22:19:43+00:00 heroku[api]: Scale to web=2, worker=1 by firstname.lastname@example.org
Note that the logged message includes the full dyno formation, not just dynos mentioned in the scale command.
Singleton process types, such as clock/scheduler process type or a process type to consume the Twitter streaming API, should never be scaled beyond a single dyno. They can’t benefit from additional concurrency and in fact they will create duplicate records or events in your system as each tries to do the same work at the same time.
Scaling up a given process type gives you more concurrency for the type of work handled by that process type. For example, adding more web dynos allows you to handle more concurrent HTTP requests, and therefore higher volumes of traffic. Adding more worker dynos will let you process more jobs in parallel, and therefore higher volumes of jobs.
There are circumstances where creating more dynos to run your web, worker, or other process types won’t help. One of these is bottlenecks on backing services, most commonly the database. If your database is a bottleneck, adding more dynos may actually make the problem worse. Instead, optimize your database queries, upgrade to a larger database, use caching to reduce load on the database, or switch to a sharded configuration or scale reads using followers.
Another circumstance where increased concurrency won’t help is long requests or jobs. For example, a slow HTTP request such as a report with a database query that takes 30 seconds, or a job to email out your newsletter to 20,000 subscribers. Concurrency gives you horizontal scale, which means it applies to work that can be subdivided - not large, monolithic work blocks.
The solution to the slow report might be to move the report calculation into the background and cache the results in memcache for later display. For the long job, the answer is to subdivide the work - create a single job which fans out by putting 20,000 jobs (one for each newsletter to be sent) onto the queue. A single worker can consume all these jobs in sequence, or you can scale up to multiple workers to consume these jobs more quickly. The more workers you add, the more quickly the entire batch will finish.