Deploying Python Applications with Gunicorn
Last updated 12 January 2016
Table of Contents
Web applications that process incoming HTTP requests concurrently make much more efficient use of dyno resources than web applications that only process one request at a time. Because of this, we recommend using web servers that support concurrent request processing whenever developing and running production services.
The Django and Flask web frameworks feature convenient built-in web servers, but these blocking servers only process a single request at a time. If you deploy with one of these servers on Heroku, your dyno resources will be underutilized and your application will feel unresponsive.
Gunicorn is a pure-Python HTTP server for WSGI applications. It allows you to run any Python application concurrently by running multiple Python processes within a single dyno. It provides a perfect balance of performance, flexibility, and configuration simplicity.
This guide will walk you through deploying a new Python application to Heroku using the Gunicorn web server. For basic setup and knowledge about Heroku, see Getting Started with Python.
As always, test configuration changes in a staging environment before you deploy to your production application.
Adding Gunicorn to your application
First, install Gunicorn with
$ pip install gunicorn
Then, update your
requirements.txt file with
$ pip freeze > requirements.txt
Or, you can manually add it to your
Next, revise your application’s
Procfile to use Gunicorn. Here’s an example
Procfile for the Flask application we created in Getting Started with Python on Heroku.
web: gunicorn gettingstarted.wsgi --log-file -
Gunicorn forks multiple system processes within each dyno to allow a Python app to support multiple concurrent requests without requiring them to be thread-safe. In Gunicorn terminology, these are referred to as worker processes (not to be confused with Heroku worker processes, which run in their own dynos).
Each forked system process consumes additional memory. This limits how many processes you can run in a single dyno. With a typical Django application memory footprint, you can expect to run 2–4 Gunicorn worker processes on a
standard-1x dyno. Your application may allow for a variation of this, depending on your application’s specific memory requirements.
We recommend setting a configuration variable for this setting. Gunicorn automatically honors the
WEB_CONCURRENCY environment variable, if set.
$ heroku config:set WEB_CONCURRENCY=3
WEB_CONCURRENCY environment variable is automatically set by Heroku, based on the processes' Dyno size. This feature is intended to be a sane starting point for your application. We recommend knowing the memory requirements of your processes and setting this configuration variable accordingly.
web: gunicorn hello:app
The Heroku Labs log-runtime-metrics feature adds support for enabling visibility into load and memory usage for running dynos. Once enabled, your can monitor application memory usage with the
heroku logs command.
If you are constrained for memory or experiencing slow app boot time, you might want to consider enabling the
preload option. This loads the application code before the worker processes are forked.
web: gunicorn hello:app --preload
See the Gunicorn Docs on Preloading for more information.
By default, Gunicorn gracefully restarts a worker if hasn’t completed any work within the last 30 seconds. If you expect your application to respond quickly to constant incoming flow of requests, try experimenting with a lower timeout configuration.
$ gunicorn hello:app --timeout 10
See the Gunicorn Docs on Worker Timeouts for more information.
Max request recycling
If your application suffers from memory leaks, you can configure Gunicorn to gracefully restart a worker after it has processed a given number of requests. This can be a convenient way to help limit the effects of the memory leak.
$ gunicorn hello:app --max-requests 1200
See the Gunicorn Docs on Max Requests for more information.
If your application is mostly I/O bound, you may want to try experimenting with asynchronous worker types, like
eventlet. These allow Gunicorn to create tens of thousands of “greenlets” to handle incoming HTTP traffic instead of heavy system threads. This can tremendously increase the concurrency and throughput of your application.
However, many application dependencies aren’t compatible with Gevent, so you will have to experiment on your own to see if this configuration will suit your needs. Luckily, with Gunicorn, this is only single-line change:
web: gunicorn hello:app --worker-class gevent
See the Gunicorn Docs on Worker Classes for more information.