Scaling Your Process Formation
Table of Contents
Apps using the process model (via Procfile) can scale up or down instantly from the command line. Each app has a set of running processes which are known as its process formation.
Scaling
A web app typically has at least web and worker process types. You can set the concurrency level for either one with the scale command:
$ heroku scale web=2
Scaling web processes... done, now running 2
Or both at once:
$ heroku scale web=2 worker=1
Scaling web processes... done, now running 2
Scaling worker processes... done, now running 1
Process Formation
The term process formation refers to the layout of your app’s processes at a given time. The default formation for most apps will be a single web process. In the examples above, the formation was first changed to two web processes, then two web processes and a worker.
The scale command affects only process types named in the command. For example, if the app already has a process formation of two web processes, and you run heroku scale worker=2, you will now have a total of four processes (two web, two worker).
Introspection
Any changes to the process formation are logged:
$ heroku logs | grep Scale
2011-05-30T22:19:43+00:00 heroku[api]: Scale to web=2, worker=1 by adam@example.com
Note that the logged message includes the full process formation, not just processes mentioned in the scale command.
The current process formation can always been seen via the ps command:
$ heroku ps
Process State Command
------------ ------------------ ------------------------------
web.1 up for 8h bundle exec thin start -p $PORT
web.2 up for 3m bundle exec thin start -p $PORT
worker.1 up for 1m bundle exec stalk worker.rb
Understanding Concurrency
Singleton process types, such as clock/scheduler process type or a process type to consume the Twitter streaming API, should never be scaled beyond a single process. They can’t benefit from additional concurrency and in fact they will create duplicate records or events in your system as they both try to do the same work.
Scaling up a given process type gives you more concurrency for the type of work handled by that process type. For example, adding more web dynos allows you to handle more concurrent HTTP requests, and therefore higher volumes of traffic. Adding more workers will let you process more jobs in parallel, and therefore higher volumes of jobs.
There are circumstances where adding more dynos to your web, worker, or other process types won’t help. One of these is bottlenecks on backing services, most commonly the database. If your database is a bottleneck, adding more dynos may actually make the problem worse. Instead, optimize your database queries, upgrade to a larger database, use caching to reduce load on the database, or switch to a sharded or read-slave database configuration.
Another circumstance where increased concurrency won’t help is long requests or jobs. For example, a slow HTTP request such as a report with a database query that takes 30 seconds, or a job to email out your newsletter to 20k subscribers. Concurrency gives you horizontal scale, which means it applies to work that can be subdivided - not large, monolithic work blocks.
The solution to the slow report might be to move the report calculation into the background and cache the results in memcache for later display. For the long job, the answer is the subdivide the work - create a single job which fans out by putting 20k jobs (one for each newsletter to be sent) onto the queue. A single worker can consume all these jobs in sequence, or you can scale up to multiple workers to consume these jobs more quickly. The more workers you add, the more quickly the entire batch will finish.