This add-on is operated by Adept Mobile LLC.
Automated Scaling of Heroku dynos
Last updated 10 May 2018
Table of Contents
Adding AdeptScale to an application not only prevents performance degradation from a sharp increases in traffic but can save you money by reducing the number of running web dynos during low-traffic periods.
AdeptScale works by ingesting and processing an application’s syslog drain. With a few parameters found in the settings section, the AdeptScale calculation algorithm will recommend dyno usage and will scale your dynos if necessary.
After initial setup and a period of data collection, the default scale settings can be modified to best suit the needs of an application.
To install the add-on run:
$ heroku addons:create adept-scale
Once AdeptScale has been added a syslog drain will be added to your application. You can verify this with the heroku “drains” command.
$ heroku drains -----> [ Adept-scale your-app-name ]
No code setup is needed. Just provision the add-on and go.
Before we can scale on your behalf you must navigate to the Account tab and perform the following actions:
- Authorize the addon to scale your app by following the directions in the App Setup panel.
- Read the Terms of Service Agreement carefully and check the “I Agree” box.
- Press Submit to confirm your agreement
The authorization method available will depend on your account type. If you have a standard user account, you may authorize using the OAuth system by simply following the link. However, if your app is on an “organization account” through Heroku Enterprise you will be required to provide the account’s API key.
If your account requires the use of an API Key, it can be found on Heroku’s Manage Account page
Once these actions have been completed and the Dyno Scaling setting is turned on under the Settings tab, AdeptScale will start auto scaling your application. You may of course enable/disable Dyno Scaling at any time once your app has been authorized.
All AdeptScale support, runtime issues, or product feedback should be submitted via the Contact link in the lower right corner of the Online Interface. This will allow us to match your ticket to your app.
For general questions please use the Contact form on the AdeptScale home page.
The online interface can be accessed by logging into the Heroku dashboard, selecting the AdeptScale enabled application, and then the addon itself from the Resources menu.
It can also be accessed via the CLI:
$ heroku addons:open adept-scale Opening adept-scale for sharp-mountain-4005…
After logging into the online interface you will have access to the four main components of AdeptScale
The AdeptScale Dashboard is where you can browse trends in your traffic and its scaling. It consists of two sections, current and history. The current section displays 2 self-updating graphs that show a recent history of your application’s average response time, total request count, dyno usage and dyno recommendations. The history section shows up to 72 hours of the same data and can be viewed all at once or in 5 minute, 1 hour, or 1 day windows.
Dyno Recommendations are generated from your settings and application metrics and uses them to scale your app. With the Dyno Scaling setting on, AdeptScale will scale your app to the recommended dyno number.
Do you want to be informed when your application has spun up 50 dynos for more than an hour? or if your application responds in more than 12000 milliseconds 5 times in an hour? You can setup alerts to send emails when certain events occur over a period of time.
The current metrics you can alert on are:
- a number of dynos running over a time period.
- response times occurring a number of times over a time period.
Upon triggering a rule, AdeptScale can send an email or ping a url. There is also an atom feed that you can subscribe to. The atom feed link can be found at the bottom of the alerts page.
AdeptScale will only send a blank GET request to the ping url. No data is sent.
The settings area provides you with a way to dictate how AdeptScale will react to load on your application. A graph at the top shows a recent history of metrics from you application, where the recommended dynos line is calculated from the settings you provide in the fields below it. With Dyno Scaling on, AdeptScale auto scales your application to the recommended value.
The settings graph will update the scale recommendation line as you change the sliders so you can get an idea of what to expect before submitting changes.
The settings won’t be applied until you submit the settings form.
Depending on the traffic and response times for your application in recent history, the graph may not appear to react to your settings updates. For example, if your application isn’t receiving any traffic or there is not enough valid history to parse, then the recommendation will not change.
What do these settings mean?
When Dyno Scaling is on your app will be scaled to match the current Recommendation. When Dyno Scaling is off AdeptScale will continue to recommend dynos but will not scale to those values.
The low end is the fewest number of dynos you think your app can safely run on. The high end is the largest number of dynos you are willing to pay for. We won’t scale beyond this, we promise.
Expected response time
This is about how long (in milliseconds) your app usually takes to respond, on average. We will use this to determine if more dynos might help so if you are overly optimistic you may end up with more dynos than you need.
A current average response time is provided for you. Use this number when deciding an expected response time.
The sample window is the amount of local data included in a regression as the scale calculator moves across your recent history while trying to determine the ‘inertia’ of your traffic. This means that the larger the window the more data outliers, such as spikes in response time, will be muted. For most web applications a larger value like 7-8 is best as there will always be network glitches or the occasional DB or API dropout. Small values can be useful for custom applications or API’s where the features of the traffic are well known or very regular.
Dyno increase and decrease rate
The increase and decrease rate are a pair which work in opposition. They apply a sort of lift or gravity to the recommendation as your app’s traffic, response, and queue time indicate an increase or decrease in need.
If increase-rate is larger, it will tend to sit near max dynos and only drop when there is an obvious lull in traffic trend. This is for people who want to keep a set number of dynos running but potentially save money during weekends or an occasional downtime in traffic.
If decrease-rate is larger, it will tend to gravitate towards min dynos, and increase based on traffic inertia, response time and, if enabled, CPU load (linux uptime) and memory. A larger decrease-rate is the most popular setting.
If the two settings are equal, the recommendation will move freely between min and max settings. If there is a pull up, it will increase and then sit there until there is decisive pull back down.
The size/value of these two sliders represents how much the recommendation will pull in that direction.
The Account section identifies your application and lists the current configuration and recommended price tier based on your history. This section also contains a form with the display timezone and the two setup fields as described in the setup section.
This section also hosts Tutorials which can be viewed at any time
Worker / Non-Web Scaling Beta
AdeptScale now has a WebWorker Standard plan available in Beta.
How does worker scaling work?
Because uses for worker dynos are so varied, the mechanism we use is reactive rather than predictive like our web scaling. It also relies on the app to provide some general information about the activity and/or state of its workers. This is done by adding specific tags to your logs which we will use to maintain a record of how many dynos are queued/running and if we need to scale up or down.
The format of the log line your app should add is:
* ADEPT_SCALE JOB_<tag> [option=value, ...]
The available tags are:
- JOB_QUEUED: Notifies AdeptScale that a job has been added to a queue
- JOB_STARTING: Notifies AdeptScale that a job is starting
- JOB_STATUS: Used to confirm / update the current count of running jobs
- JOB_COMPLETE: Notifies AdeptScale that a job has finished
- JOB_FAILED: Notifies AdeptScale that a job failed
Available options are:
- dyno_type: Dyno type where the job runs. (defaults to the dyno creating the log)
- job_id: This is just to improve accuracy. (optional)
- scheduled_at: This is just to improve accuracy. (optional)
- total_queue: Used to update Adept’s count if it becomes inacurace. (optional)
- running_jobs: Used to update Adept’s count if it becomes inacurace. (optional)
Some example log entries:
- ADEPT_SCALE JOB_QUEUED dyno_type=worker
- ADEPT_SCALE JOB_STARTING job_id=12345
- ADEPT_SCALE JOB_STATUS dyno_type=thumbnail_worker total_queue=34 running_jobs=10
It is often the case that the request to queue, logging the
JOB_QUEUED tag, is made on a different dyno that the one that will run the job.
In this case, the app should include the
dyno_type option so AdeptScale knows which dyno to scale if necessary.
For ease of development we recommend using a plugin available for your language and queue system. Currently, the only plugin readily available for use during the Beta is a Ruby Gem for the ActiveJob queue system. It can be found at https://github.com/AdeptMobile/adept_scale_active_job The next plugin available will be for Node. If you are interested in contributing to the development of a plugin for a specific language or queue system, please let us know.
Where do I start?
For the Beta period, worker scaling is only available by request. The best way is to submit a support ticket to our contact-us form. If you have an app already on AdeptScale, please submit from your apps dashboard. Please include: Your app’s language, how many of each type of dynos you think you will run as a baseline and at peak usage, and a general use case. We will reply as quickly as possible and help you through the setup process as necessary. Some examples of good use cases:
- “I only run one job for 2 hours at midnight”
- “I run 2 jobs constantly on one 2x dyno but that could scale up to 100 jobs on 20 2x dynos at some point today”
Some examples of bad use cases:
- “I have three jobs that run for 60 seconds every 2 minutes. I want to scale up to 3 dynos and back to 0 every other minute so I only pay half as much” This is activity will cause dyno-churn and is a great way to accidentally terminate a dyno just as the next cycle is starting.
Gotchas and Known Issues
- Scale up time: expect 1-3 minutes before a scale up. Capturing logs to provision dynos is by no means an instant process. If you want a job to instantly start processing the minute it hits the queue, you should already have a dyno running and ready for it.
- More dynos than jobs: Scaling dynos works as a “LIFO” stack so it is possible to scale up to a large number of dynos then start a long job on the newest (highest numbered) dyno. Even if the older ones become idle, we cannot scale down because it will kill the newest dyno first, breaking the running job. Try to group jobs of similar durations on their own queues/dynos to minimize this effect.
- Dyno-type agnostic jobs: Currently you must choose which dyno type a new job is queued for, even if your job could run on any of your dyno types. At some point this may change but for our current design the type of dyno to scale must be chosen explicitly, and we rely on the message that a job is started or finished to let us know to remove it from the queue of that same type. If you queue a job on one dyno type but run it on another, we will have an incorrect count for both.
- Really fast jobs don’t show up: If you have a job that is queued, picked up, and completed in a couple seconds or less, it will not be reflected on the graphs as it is irrelevant for scaling. We only display processes waiting for a dyno to run on, or slower processes occupying significant dyno time. ie. this is NOT a tool to precisely monitor your jobs with.
- Idempotence in jobs: It is entirely possible that a job is started on a previously idle dyno, but before AdeptScale receives the
JOB_STARTEDmessage for that dyno it deprovisions the dyno. If this happens your job will be terminated half complete, and if it is not idempotent your data will be left in a bad state. In the future we plan to add a
scale-down-delayoption, but for now we ask Beta users to “use at your own risk” and work with our support team if you enounter this issue.
The scale down delay, or, how long it takes between when a dyno has no more jobs and when it is deprovisioned, is currently about 3 minutes for all Beta users.
API Access through the Heroku CLI
Settings can be adjusted programatically by use of our CLI plugin. You can download it at https://www.npmjs.com/package/heroku-adept-scale
Long Running Web Requests / Sockets
Long running web requests or socket connections on web dynos which traffic continuous requests are not handled by the AdeptScale calculator which is designed for traditional HTTP traffic. If a portion of your app uses this type of communication, please open a support ticket and our devs can exclude or blacklist a specific url path from the calculator. Keep in mind, this procedure is currently manual and can take up to a day to implement.
There are safety procedures built into AdeptScale for conditions such as when an app stops reporting traffic.
If, for example, an app has a baseline of 200 web req/min, gradually climbs to 1000req/min and then suddenly drops to 0, the addon will err on the side of caution and act as though there has been a break in traffic reporting. It will lock the dynos at the last known “good” setting and not take any further action until it hears from the app again. After a critical event like this, it usually takes about 30 min of 'normal’ traffic to re-build its history heuristic. You could imagine what would happen if the addon scaled a healthy and busy app down to min dynos just because traffic reporting suddenly broke.
The scale addon is designed to work with production level apps with a consistent usage baseline and organic, realistically changing traffic. The settings are designed so that scaling can be approached in a few different ways, but fundamentally the algorithm looks at historic traffic and provides a recommendation based on change in traffic trends. For this reason the results will sometimes surprise people who run it on toy programs or benchmarks with no history, and simulate sudden massive bursts of traffic followed by total inactivity.
Also, it’s important to understand that starting a dyno can take up to a minute, and a started dyno can not be removed for a minimum of 3 minutes. There are several reasons for this, but the most important are that Heroku enforces API limits to prevent dyno churn, and really, production apps don’t have traffic which necessitates drastic swings within a minute.
So, for people who want to test, we recommend starting with at least 30 minutes of simulated traffic of at least 100req/min. Once this is established, scale your traffic in a believable manner. Even the biggest news websites don’t jump from 0 to 50,000 req/min in under a minute, they climb from 5kreq/min to 50k over the course of 10-15 minutes or so.
If you have a use case for a scheduled event, however, in which you plan to have a massive and instant usage spike, the best practice is to use our API to schedule a change in min/max dynos ahead of time to ensure the required dynos have a chance to fully load before the event.
Note: The above applies to web dynos only. We will have more on benchmarking worker dynos after the beta closes. Also, if you are seeing unexpected behaviour when benchmarking / load-testing, please keep in mind these considerations as well as the failsafe listed in the unusual conditions section above.
Migrating between plans
Application owners should carefully manage the migration timing to ensure proper application function during the migration process. A list of all plans available can be found here.
heroku addons:upgrade command to migrate to a new plan.
$ heroku addons:upgrade adept-scale:newplan -----> Upgrading adept-scale:newplan to sharp-mountain-4005... done, v18 ($49/mo) Your plan has been updated to: adept-scale:newplan
At times your app may need to be re-authorized to scale. This can happen for a variety of reasons:
- The owner of the app changes
- The owner of an Organization App changes her password
- The OAuth token for the account is revoked by the app owner
- The Heroku OAuth API becomes unavailble due to outage and the token can not be maintained
It is not uncommon for an API error or outage to require renewal of an OAuth token. If this happens an email notification will be sent to the app owner.
To renew scaling authorization:
- Log into Heroku using the app owner’s account/email and navigate to the AdeptScale add-on as usual
- Navigate to the Account tab
- Reconnect to the Heorku OAuth system using the link in the App Setup panel just as you did during initial setup
Removing the add-on
AdeptScale can be removed via the CLI.
This will destroy all associated data and cannot be undone!
$ heroku addons:remove adept-scale -----> Removing adept-scale from sharp-mountain-4005... done, v20 (free)
After removing the add-on be sure to verify the drain has also been removed with the “heroku drains” command. If the drain is still present, it can be removed manually with:
$ heroku drains:remove drain_name
The current version is 1.15.2