Using Celery on Heroku
Last updated October 11, 2024
Table of Contents
Using Celery on Heroku
Celery is a framework for performing asynchronous tasks in your application. Celery is written in Python and makes it very easy to offload work out of the synchronous request lifecycle of a web app onto a pool of task workers to perform jobs asynchronously.
Celery is fully supported on Heroku and just requires using one of our add-on providers to implement the message broker and result store.
Architecture
You deploy Celery by running one or more worker processes. These processes connect to the message broker and listen for job requests. The message broker distributes job requests at random to all listening workers. To originate a job request, you invoke code in the Celery library which takes care of marshaling arguments to the job and publishing the job request to the broker.
You can flexibly scale your worker pool by simply running more worker processes which connect to the broker. Each worker can execute jobs in parallel with every other worker.
Choosing a broker
Heroku supports lots of great choices for your Celery broker via add-ons provided by our partner companies. Generally speaking, the broker engines with the best support within Celery include Redis and RabbitMQ. Others including Amazon SQS, IronMQ, MongoDB, and CouchDB are also supported, though some features may be missing when using these brokers. The way Celery abstracts away the specifics of broker implementations make changing brokers relatively easy, so if at some later point you decide a different broker better suits your needs, feel free to switch.
Once you’ve chosen a broker, create your Heroku app and attach the add-on to it. In the examples we’ll use Heroku Key-Value Store as the Redis provider but there are plenty of other Redis providers in the Heroku Elements Marketplace.
$ heroku apps:create
$ heroku addons:create heroku-redis -a example-app
Celery ships with a library to talk to RabbitMQ, but for any other broker, you’ll need to install its library. For example, when using Redis:
$ pip install redis
Choosing a result store
If you need Celery to be able to store the results of tasks, you’ll need to choose a result store. If not, skip to the next section. Characteristics that make a good message broker do not necessarily make a good result store! For instance, while RabbitMQ is the best supported message broker, it should never be used as a result store since it will drop results after being asked for them once. Both Redis and Memcache are good candidates for result stores.
If you choose the same result store as message broker, you do not need to attach 2 add-ons. If not, make sure the result store add-on is attached.
Creating a Celery app
First, make sure Celery itself is installed:
$ pip install celery
Then create a Python module for your celery app. For small celery apps, it’s common to put everything together in a module named tasks.py
:
import celery
app = celery.Celery('example')
Defining tasks
Now that you have a Celery app, you need to tell the app what it can do. The basic unit of code in Celery is the task. This is just a Python function that you register with Celery so that it can be invoked asynchronously. Celery has various decorators which define tasks, and a couple of method calls for invoking
those tasks. Add a simple task to your tasks.py
module:
@app.task
def add(x, y):
return x + y
You’ve now created your first Celery task! But before it can be run by a worker, a bit of configuration must be done.
Configuring a Celery app
Celery has many configuration options, but to get up and running you only need to worry about a couple:
BROKER_URL
: The URL that tells Celery how to connect to the message broker. (This will commonly be supplied by the add-on chosen to be the broker.)CELERY_RESULT_BACKEND
: A URL in the same format asBROKER_URL
that tells Celery how to connect to the result store. (Ignore this setting if you choose not to store results.)
Heroku add-ons provide your application with environment variables which can be passed to your Celery app. For example:
import os
app.conf.update(BROKER_URL=os.environ['REDIS_URL'],
CELERY_RESULT_BACKEND=os.environ['REDIS_URL'])
Your Celery app now knows to use your chosen broker and result store for all of the tasks you define in it.
Running locally
Before running celery workers locally, you’ll need to install the applications you’ve chosen for your message broker and result store. Once installed, ensure both are up and running. Then create a Procfile which Heroku Local can use to launch a worker process. Your Procfile should look something like:
worker: celery worker --app=tasks.app
Now add a file named .env
to tell Heroku Local which environment variables to set.
For example, if using Redis with its default configuration for both the message
broker and result store, simply add an environment variable with the same name
as the one provided by the add-on:
REDIS_URL=redis://
To start your worker process, run Heroku Local:
$ heroku local
Then, in a Python shell, run:
import tasks
tasks.add.delay(1, 2)
delay
tells Celery that you want to run the task asynchronously rather than
in the current process.
You should see the worker process log that it has received and executed the task.
Deploying on Heroku
If you already created a Procfile
above and attached the appropriate add-ons for the message broker and result store, all that’s left to do is push
and scale your app:
$ git push heroku master
$ heroku ps:scale worker=1
Of course, at any time you can scale to any number of worker dynos. Now run a task just like you did locally:
$ heroku run python
>>> import tasks
>>> tasks.add.delay(1, 2)
You should see the task running in the application logs:
$ heroku logs -t -d worker
Celery and Django
The Celery documentation provides a
good overview
of how to integrate Celery with Django. Though the concepts are the same,
Django’s reusable app architecture lends itself well to a module of tasks per
Django app. You may want to leverage the INSTALLED_APPS
setting in concert
with Celery’s ability to autodiscover tasks:
from django.conf import settings
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
Celery will then look for a tasks.py
file in each of your Django apps and
add all tasks found to its registry.
Celery best practices
Managing asynchronous work
Celery tasks run asynchronously, which means that the Celery function call in the calling process returns immediately after the message request to perform the task is sent to the broker. There are two ways to get results back from your tasks. One way is just to write the results of the task into some persistent storage like your database.
The other way to get results back from a Celery task is to use the result store. The Celery result store is optional. Oftentimes it is sufficient to issue tasks and let them run asynchronously and store their results to the database. In that case you can safely ignore the result as saved by Celery.
Choosing a serializer
Everything that gets passed into a Celery task and everything that comes out as a result must be serialized and deserialized. Serialization brings with it a set of problems that are good to keep in mind as you design your Celery tasks.
By default, Celery chooses to use pickle to serialize messages. Pickle has the benefit of “just working” out of the box, but it can cause many problems down the road. When code changes, in-flight serialized objects can cause problems when deserialized, still in their old form.
Changing Celery to use JSON not only forces developers into good practices for task arguments; it also reduces the security risk that pickle’s arbitrary code execution can pose. To use JSON as the default task serializer, set an environment variable:
CELERY_TASK_SERIALIZER=json
Small, short-lived tasks
In a celery worker pool, multiple workers will be working on any number of tasks concurrently. Because of this, it makes sense to think about task design much like that of multithreaded applications. Each task should do the smallest useful amount of work possible so that the work can be distributed as efficiently as possible.
Because of the overhead involved in serializing a task, sending it over the network to the message broker, sending it back over the network to the worker, and deserializing the task is much higher than sending an message on an intra-process queue between threads, the bar for “useful” is of course higher as well. Issuing a task that does nothing but write a single row to a database is probably not the best use of resources. Making 1 API call rather than several in the same task, though, can make a big difference.
Short-lived tasks make deploys and restarts easier. When stopping a dyno, Heroku will kill processes a short amount of time after asking them to stop. This could happen as a result of a restart, scale down, or deploy. As a result, tasks which finish quickly will fail less often than long running ones.
Set task timeouts
When running a task hundreds or thousands of times, occasionally a task can get stuck due to network activity and block your queue from handling more tasks. This can be easily be avoided by setting hard and soft timeouts for all of your tasks.
You can read more about timeouts in the Celery documentation.
Idempotent tasks
Celery tasks can fail or be interrupted for a variety of reasons. Rather than trying to anticipate everything that could possibly go wrong in a distributed system, you should strive for idempotence when designing tasks. Try never to assume the current state of the system when a task begins. Change as little external state as possible.
This is more of an art than a science, but the more tasks are re-runable without negative side-effects, the easier distributed systems can self-heal. For example, when an unexpected error occurs, idempotent tasks can simply tell Celery to retry the task. If the error was transient, idempotence allows the system to self-heal without human intervention.
Using REMAP_SIGTERM
When a running process on Heroku is terminated, it is sent a SIGTERM
signal
10 seconds before it will be shut down. Starting in version >= 4, Celery comes with a special feature, just
for Heroku, that supports this functionality out of the box:
$ REMAP_SIGTERM=SIGQUIT celery -A proj worker -l info
Using acks_late
When celery workers receive a task from the message broker, they send an acknowledgement back. Brokers usually react to an ack by removing the task from their queue. However, if a worker dies in the middle of a task and has already acknowledged it, the task may not get run again. Celery tries to mitigate this risk by sending unfinished tasks back to the broker on a soft shutdown, but sometimes it’s unable due to a network failure, total hardware failure, or any number of other scenarios.
Celery can be configured to only ack tasks after they have completed (succeeded or failed). This feature is extremely useful when losing the occasional task is not tolerable. However, it requires the task to be idempotent (the previous attempt may have progressed part of the way through) and short-lived (brokers will generally “reserve” a task for a fixed period of time before reintroducing it to the queue). Acks_late can be enabled task by task or globally for the Celery app.
Custom task classes
Even though a task may look like a function, Celery’s task decorator is actually
returning a class which implements __call__
. (This is why tasks can be bound
to self
and have other methods available to them like delay
and
apply_async
.) The decorator is a great shortcut, but sometimes a group of
tasks share common concerns which can’t be easily expressed purely functionally.
By creating an abstract subclass of celery.Task
, you can build a suite of
tools and behaviors needed by other tasks using inheritance. Common uses for
task subclasses are rate limiting and retry behavior, setup or teardown work, or
even a common group of configuration settings shared by only a subset of an
app’s tasks.
Canvas primitives
Celery’s documentation
outlines a set
of primitives which can be used to combine tasks into larger workflows.
Familiarize yourself with these primitives. They provide ways to accomplish
significant and complex work without sacrificing the design principles outlined
above. Particularly useful is the chord
, in which a group of tasks are
executed in parallel and their results passed to another task upon completion.
These, of course, require a result store.