Scheduled Jobs with Custom Clock Processes in Python with APScheduler
Last updated April 29, 2024
Table of Contents
The ability to schedule background jobs is a requirement for most modern web apps. These jobs might be user-oriented, like sending emails; administrative, like taking backups or synchronizing data; or even a more integral part of the app itself.
On a single server deployment a system level tool like cron
is the obvious choice to accomplish this kind of scheduling. However, when deploying to a cloud platform like Heroku, something higher level is required since instances of the application will be running in a distributed environment where machine-local tools are not useful.
The Heroku Scheduler add-on is a fantastic solution for simple tasks that need to run at 10 minute, hourly, or daily intervals (or multiples of those intervals). But what about tasks that need to run every 5 minutes or 37 minutes or those that need to run at a very specific time? For these more unique and complicated use cases running your own scheduling process can be very useful.
APScheduler
There are a few Python scheduling libraries to choose from. Celery is an extremely robust synchronous task queue and message system that supports scheduled tasks.
For this example, we’re going to use APScheduler, a lightweight, in-process task scheduler. It provides a clean, easy-to-use scheduling API, has no dependencies and is not tied to any specific job queuing system.
Install APScheduler easily with pip:
$ pip install apscheduler
And make sure to add it to your requirements.txt
:
APScheduler>=3.10,<4.0
Execution schedule
Next you’ll need to author the file to define your schedule. The APScheduler Documentation has a lot of great examples that show the flexibility of the library.
Here’s a simple clock.py
example file:
from apscheduler.schedulers.blocking import BlockingScheduler
sched = BlockingScheduler()
@sched.scheduled_job('interval', minutes=3)
def timed_job():
print('This job is run every three minutes.')
@sched.scheduled_job('cron', day_of_week='mon-fri', hour=17)
def scheduled_job():
print('This job is run every weekday at 5pm.')
sched.start()
Here we’ve configured APScheduler to queue background jobs in 2 different ways. The first directive will schedule an interval job every 3 minutes, starting at the time the clock process is launched. The second will queue a scheduled job once per weekday only at 5pm.
While this is a trivial example, it’s important to note that no work should be done in the clock process itself for reasons already covered in the clock processes article. Instead schedule a background job that will perform the actual work invoked from the clock process.
Clock process type
Finally, you’ll need to define a process type in the Procfile. In this example we’ll call the process clock
, so the Procfile should look something like this:
clock: python clock.py
Deployment
Commit the requirements.txt
, Procfile
, and clock.py
changes and redeploy your application with a git push heroku master
.
The final step is to scale up the clock process. This is a singleton process, meaning you’ll never need to scale up more than 1 of these processes. If you run two, the work will be duplicated.
$ heroku ps:scale clock=1
You should see similar output to the following in your Heroku logs.
2023-05-30T20:59:38+00:00 heroku[clock.1]: State changed from created to starting
2023-05-30T20:59:38+00:00 heroku[api]: Scale to clock=1, web=3 by user@heroku.com
2023-05-30T20:59:40+00:00 heroku[clock.1]: Starting process with command `python clock.py`
2023-05-30T20:59:41+00:00 heroku[clock.1]: State changed from starting to up
2023-05-30T20:59:48+00:00 app[clock.1]: Starting clock for 1 events: [ Queueing interval job ]
2023-05-30T20:59:48+00:00 app[clock.1]: Queuing scheduled jobs
Now you have a custom clock process up and running. Check out the APScheduler Documentation for more info.