Deploying Python Applications with Gunicorn

Last updated December 03, 2024

Adding Gunicorn to your application
Basic configuration
Advanced configuration
Further reading

Web applications that process incoming HTTP requests concurrently make much more efficient use of dyno resources than web applications that only process one request at a time. Because of this, we recommend using web servers that support concurrent request processing whenever developing and running production services.

The Django and Flask web frameworks feature convenient built-in web servers, but these blocking servers only process a single request at a time. If you deploy with one of these servers on Heroku, your dyno resources will be underutilized and your application will feel unresponsive.

Gunicorn is a pure-Python HTTP server for WSGI applications. It allows you to run any Python application concurrently by running multiple Python processes within a single dyno. It provides a perfect balance of performance, flexibility, and configuration simplicity.

This guide will walk you through deploying a new Python application to Heroku using the Gunicorn web server. For basic setup and knowledge about Heroku, see Getting Started with Python.

As always, test configuration changes in a staging environment before you deploy to your production application.

Adding Gunicorn to your application

First, install Gunicorn with pip:

$ pip install gunicorn

Be sure to add gunicorn to your requirements.txt file as well.

Next, revise your application’s Procfile to use Gunicorn. Here’s an example Procfile for the Django application we created in Getting Started with Python on Heroku.

Procfile

web: gunicorn gettingstarted.wsgi

Basic configuration

Gunicorn forks multiple system processes within each dyno to allow a Python app to support multiple concurrent requests without requiring them to be thread-safe. In Gunicorn terminology, these are referred to as worker processes (not to be confused with Heroku worker processes, which run in their own dynos).

Each forked system process consumes additional memory. This limits how many processes you can run in a single dyno. With a typical Django application memory footprint, you can expect to run 2–3 Gunicorn worker processes on an eco, basic, or standard-1x dyno. Your application may allow for a variation of this, depending on your application’s specific memory requirements.

We recommend setting a configuration variable for this setting. Gunicorn automatically honors the WEB_CONCURRENCY environment variable, if set.

$ heroku config:set WEB_CONCURRENCY=3

The WEB_CONCURRENCY environment variable is automatically set by Heroku, based on the processes’ Dyno size. This feature is intended to be a sensible starting point for your application. We recommend knowing the memory requirements of your processes and setting this configuration variable accordingly.

Read Optimizing Python Application Concurrency for more information on tuning Python applications for maximum throughput.

Procfile

web: gunicorn hello:app

The Heroku Labs log-runtime-metrics feature adds support for enabling visibility into load and memory usage for running dynos. Once enabled, you can monitor application memory usage with the heroku logs command.

Advanced configuration

App preloading

If you are constrained for memory or experiencing slow app boot time, you might want to consider enabling the preload option. This loads the application code before the worker processes are forked.

web: gunicorn hello:app --preload

See the Gunicorn Docs on Preloading for more information.

Worker timeouts

By default, Gunicorn gracefully restarts a worker if hasn’t completed any work within the last 30 seconds. If you expect your application to respond quickly to constant incoming flow of requests, try experimenting with a lower timeout configuration.

$ gunicorn hello:app --timeout 10

See the Gunicorn Docs on Worker Timeouts for more information.

Max request recycling

If your application suffers from memory leaks, you can configure Gunicorn to gracefully restart a worker after it has processed a given number of requests. This can be a convenient way to help limit the effects of the memory leak.

$ gunicorn hello:app --max-requests 1200

See the Gunicorn Docs on Max Requests for more information.

Categories

Deploying Python Applications with Gunicorn

Table of Contents

Adding Gunicorn to your application

Procfile

Basic configuration

Procfile

Advanced configuration

App preloading

Worker timeouts

Max request recycling

Further reading

Keep reading

Feedback