Running Kafka Connectors on Heroku
Last updated May 19, 2021
Running Apache Kafka Connectors on Heroku
Apache Kafka Connectors are packaged applications designed for moving and/or modifying data between Apache Kafka and other systems or data stores. They are built leveraging the Apache Kafka Connect framework. The Apache Kafka Connect framework makes it easier to build and bundle common data transport tasks such as syncing data to a database. It does this by handling the functions that are not unique to the task so that the developer can focus on what is unique to their use case. This article is focused on considerations for running pre-built Connectors on Heroku. If you plan to build a custom Connector, you can follow the Connector Developer Guide, while keeping in mind the considerations in this article to ensure your Connector runs smoothly on Heroku.
Connectors come in two varieties:
Source Connectors - these are used to send data to Apache Kafka
Sink Connectors - these are used to retrieve data from Apache Kafka
Many Connectors can act as either a Source or Sink depending on the configuration. Below is an example of a database Connector that watches for changes in Postgres and then adds them to a corresponding topic in Apache Kafka. This Connector could also work in the other direction and add changes from Apache Kafka to a table in Postgres.
Pre-built Connectors for Apache Kafka
Ensure that both the license and the technical requirements for existing Connectors are suitable for your application. Some Connectors have licensing restrictions or are intended for an environment that you manage directly, rather than a cloud environment where Kafka is managed for you. In particular, let’s look at some considerations for running a Connector on Heroku.
Running Connectors on Heroku
In Heroku, Connectors are applications that talk to your Apache Kafka cluster, using the Kafka Connect API to produce or consume data.
Before running a Connector on Heroku, evaluate it against these criteria:
Long-term local storage
Heroku Dynos have an ephemeral filesystem; local data will not exist beyond 24 hours. Connectors that require long-term local storage are incompatible, though some can be configured to change this requirement.
Automatic topic creation
Some Connectors automatically create Topics to manage state, but Apache Kafka on Heroku does not currently support automatic topic creation. To use a Connector that requires certain topics, pre-create them, and disable first-write creation in the Connector. Connectors with a hard requirement of automatic Topics are not compatible with Heroku.
Operator-level cluster access
Connectors that require operator-level access to an Apache Kafka cluster, such as changing cluster configuration, are not compatible with Apache Kafka on Heroku. Apache Kafka on Heroku is a managed service where Heroku is responsible for updates, uptime, and maintenance.
Operator-level access to other systems
Connectors that require operator-level access to connected systems, such as PostgreSQL, are not compatible with Apache Kafka on Heroku. For example, the Debezium PostgreSQL Connector requires installing an output plugin on the PostgreSQL server. This would require access beyond what is allowed on the managed Heroku PostgreSQL service.
When operating a Connector on Heroku, follow these guidelines:
One consideration should also be how resilient you need your tasks to be. For example, as mentioned in the documentation, “When a task fails, no rebalance is triggered as a task failure is considered an exceptional case. As such, failed tasks are not automatically restarted by the framework and should be restarted via the REST API.” This means if you are not monitoring the tasks things may not be happening as expected. You can monitor the Connector application like you would another application on Heroku e.g. with New Relic, Librato, or a similar service.
To use a Connector that requires property files, you can create an initialization script that writes files based on config vars before launching the Connector processes. This example repository writes configuration files with a start-distributed script, which is launched by its Procfile at dyno boot.
Installing system packages
If you cannot install the packages you need via the Apt Buildpack, you may instead be able to use a Docker deployment.