Heroku Streaming Data Connectors
Last updated December 08, 2022
Table of Contents
This article describes how to configure Change Data Capture (CDC) for Heroku Postgres events and stream them to your Apache Kafka on Heroku add-on provisioned in a Private Space or a Shield Private Space. This process involves three high-level steps:
- Creating an app in Private Space or Shield Private Space.
- Provisioning a Private or Shield Heroku Postgres add-on and a Private or Shield Apache Kafka on Heroku add-on on your new app.
- Creating a streaming data connector to enable CDC events from your Postgres to your Kafka.
For more information about how to best configure a streaming data connector, see Best Practices for Heroku’s Streaming Data Connectors.
Heroku App Setup
To begin, create a Private or Shield Private Space. When your Space is available, you can create an app in your Space.
$ heroku spaces:create --region virginia --team my-team-name --space myspace $ heroku spaces:wait --space myspace $ heroku apps:create --space myspace my-cdc-app
Heroku Add-ons Setup
Next, you need two Private or Shield data add-ons attached to your app.
Your Postgres add-on must be version 10 or higher. Your Kafka add-on must be version 2.3 or higher.
$ heroku addons:create heroku-postgresql:private-7 --as DATABASE --app my-cdc-app $ heroku addons:create heroku-kafka:private-extended-2 --as KAFKA --app my-cdc-app
You can monitor the add-on provisioning progress:
$ heroku addons:wait --app my-cdc-app
When your add-ons are available, import your schema and/or data into your Postgres database.
Heroku’s Streaming Data Connector Setup
When you have a Private or Shield Private Space App with Heroku Postgres and Apache Kafka on Heroku add-ons configured, you can provision a connector.
First, install the CLI plugin:
$ heroku plugins:install @heroku-cli/plugin-data-connectors
To create a connector, you must gather several pieces of information.
- The name of the Kafka add-on
- The name of the Postgres add-on
- The name(s) of the Postgres tables from which you want to capture events
- (optionally) The name(s) of the columns you wish to exclude from capture events
In order to capture events in your Postgres database, a few requirements must be met:
- The database encoding must be UTF-8
- The table(s) must currently exist
- The table(s) must have a primary key
- The table(s) must not be partitioned
- The table name(s) must only contain the characters
- The Kafka Formation must have direct Zookeeper access disabled
You want to take care in choosing what tables to capture. A single connector isn’t able to keep up with a high volume of events from many tables.
Next, you can create the connector. You need the names of your Postgres and Kafka add-ons, as well as a list of fully qualified tables you want to include in your database capture events:
$ heroku data:connectors:create \ --source postgresql-neato-98765 \ --store kafka-lovely-12345 \ --table public.posts --table public.users
Provisioning can take approximately 15–20 minutes to complete. You can monitor the connector provisioning progress:
$ heroku data:connectors:wait gentle-connector-1234
When your connector is available, you can view the details including newly created Kafka topics:
$ heroku data:connectors:info gentle-connector-1234 === Data Connector status for gentle_connector_1234 Name: gentle_connector_1234 Status: available === Configuration Table Name Topic Name public.posts gentle_connector_1234.public.posts public.users gentle_connector_1234.public.users
Managing a Connector
After you’ve created your connector, there a few options available for managing it.
Pause or Resume
# to pause $ heroku data:connectors:pause gentle-connector-1234 # to resume $ heroku data:connectors:resume gentle-connector-1234
Under normal operation, the connector doesn’t lose change events that occur while a connector is paused. The connector uses a replication slot on the Postgres database to track progress, and picks up where it left off without losing data when resumed.
Don’t leave connectors in a “paused” state for more than a few hours. Paused connectors prevent WAL from being deleted, which can put the primary database at risk. It’s better to destroy the connector than to leave it paused for a long period.
Change events that occur while a connector is paused are not guaranteed to make it into Kafka. In the event of a failover (due to a system failure or a scheduled maintenance), change events after the connector was paused will be lost.
If the connector is paused for a very long time on a busy database, the replication slot prevents Postgres from deleting unread write-ahead logs (WAL). As a result, the WAL drive can fill up, which causes the database to shut down. Our automation generally detects these situations ahead of time, but in a worst-case scenario, we must drop the replication slot to protect the database. In that rare case, change events wouldn’t make it to Kafka.
You can modify certain properties associated with your connector via the CLI. These properties include:
|property||possible values||default value||details|
For example, you can update the
$ heroku data:connectors:update gentle-connector-1234 \ --setting tombstones.on.delete=false
It’s recommended that you familiarize yourself with our recommended Best Practices when working with connectors.
Configuration managed by Heroku
Most configuration properties are entirely managed by Heroku and are modified as needed.
Update Tables and Excluded Columns
You can also modify the connector’s Postgres tables, as well as excluded columns.
For example, you can add the table
public.parcels and remove the table
$ heroku data:connectors:update gentle-connector-1234 \ --add-table public.parcels \ --remove-table public.posts
New tables must adhere to the same requirements as outlined in the Setup.
Likewise, you can add and remove excluded columns:
$ heroku data:connectors:update gentle-connector-1234 \ --exclude-column public.parcels.address \ --remove-excluded-column public.posts.keys
Destroying a Connector
You can destroy a connector via the CLI.
This command does not destroy the Kafka topics used to produce events. You must manage their lifecycle independently.
$ heroku data:connectors:destroy gentle-connector-1234