Skip Navigation
Show nav
Heroku Dev Center
  • Get Started
  • Documentation
  • Changelog
  • Search
  • Get Started
    • Node.js
    • Ruby on Rails
    • Ruby
    • Python
    • Java
    • PHP
    • Go
    • Scala
    • Clojure
  • Documentation
  • Changelog
  • More
    Additional Resources
    • Home
    • Elements
    • Products
    • Pricing
    • Careers
    • Help
    • Status
    • Events
    • Podcasts
    • Compliance Center
    Heroku Blog

    Heroku Blog

    Find out what's new with Heroku on our blog.

    Visit Blog
  • Log inorSign up
View categories

Categories

  • Heroku Architecture
    • Dynos (app containers)
    • Stacks (operating system images)
    • Networking & DNS
    • Platform Policies
    • Platform Principles
  • Command Line
  • Deployment
    • Deploying with Git
    • Deploying with Docker
    • Deployment Integrations
  • Continuous Delivery
    • Continuous Integration
  • Language Support
    • Node.js
    • Ruby
      • Working with Bundler
      • Rails Support
    • Python
      • Working with Django
      • Background Jobs in Python
    • Java
      • Working with Maven
      • Java Database Operations
      • Working with the Play Framework
      • Java Advanced Topics
      • Working with Spring Boot
    • PHP
    • Go
      • Go Dependency Management
    • Scala
    • Clojure
  • Databases & Data Management
    • Heroku Postgres
      • Postgres Basics
      • Postgres Performance
      • Postgres Data Transfer & Preservation
      • Postgres Availability
      • Postgres Special Topics
    • Heroku Redis
    • Apache Kafka on Heroku
    • Other Data Stores
  • Monitoring & Metrics
    • Logging
  • App Performance
  • Add-ons
    • All Add-ons
  • Collaboration
  • Security
    • App Security
    • Identities & Authentication
    • Compliance
  • Heroku Enterprise
    • Private Spaces
      • Infrastructure Networking
    • Enterprise Accounts
    • Enterprise Teams
    • Heroku Connect (Salesforce sync)
    • Single Sign-on (SSO)
  • Patterns & Best Practices
  • Extending Heroku
    • Platform API
    • App Webhooks
    • Heroku Labs
    • Building Add-ons
      • Add-on Development Tasks
      • Add-on APIs
      • Add-on Guidelines & Requirements
    • Building CLI Plugins
    • Developing Buildpacks
    • Dev Center
  • Accounts & Billing
  • Troubleshooting & Support
  • Databases & Data Management
  • Best Practices for Heroku's Streaming Data Connectors

Best Practices for Heroku's Streaming Data Connectors

English — 日本語に切り替える

Last updated March 11, 2021

Table of Contents

  • Useful Posts
  • Points to Note

This article describes several notes around best practices for operating your streaming data connector. This integration is supported by Apache Kafka Connect and Debezium, so you will see references to these components when discussing configuration options.

Useful Posts

When Things Go Wrong (Debezium)

Points to Note

Create

  • Debezium only works with UTF-8 databases. If you are using a different encoding, you will need to change it to UTF-8 before enabling PG2K. If you change to another encoding after the connector is created, the connector will stop processing events.
  • When a new connector is created, it connects to a Postgres database and reads a consistent snapshot of all of the schemas. This could be an expensive operation and cause load on both the source Postgres database and the target Kafka cluster. You should take care to choose tables and columns accordingly. More details available in the documentation.

Usage

  • The “before” data in CDC events will only be populated for the parts of the table that are part of the REPLICA IDENTITY. By default, in most situations, this means that only the primary key will be reflected in the “before” data. More details available in the documentation.
  • Making changes to primary keys must be coordinated carefully to avoid issues with schema information. More details available in the documentation.
  • CDC events are designed to be at least once delivery. You will need to build and configure your Kafka consumers to handle redundant CDC events gracefully.
  • On UPDATE events, unchanged TOASTed values will have a placeholder __debezium_unavailable_value. If you do not account for this, you might end up using the placeholder as if it were the real value. Find more details in the documentation.
  • Database rows with very large amounts of data cannot be produced into Kafka messages if they exceed the maximum message size for the topic (default: 1 MB). These changes will be logged (internally), but the event will not be produced (silently).
  • Kafka topics are created with more than 1 partition. As a result, change events do not have global total ordering when being consumed. Change events for a specific row are totally ordered.

Failure

  • If a connector stops replicating, the replication slot that tracks the connector’s progress will prevent WAL from being deleted. If the Postgres database runs out of disk for WAL, the database will stop entirely. Find more details in the documentation.
  • Failures can occur for many reasons, leading the connector to stop. Some include a network partition, an AWS issue, a bug in Kafka Connect or Debezium, or an issue with our control plane. It is important that you monitor your database’s replication slots for these conditions.
  • In certain failure modes, we may need to remove the replication slot to preserve the operational stability of the database. When conditions have cleared and the replication slot is created again, the connector will create a new replication slot and begin publishing events from that point of recovery. Change events that were not streamed to Kafka before the database went down will be lost.

Paused Connectors

  • Connectors should not be left in a “paused” state for more than a few hours. Paused connectors will prevent WAL from being deleted, which can put the primary database at risk. It is better to destroy the connector than to leave it paused for a long period.
  • Change events that occur while a connector is paused are not guaranteed to make it into Kafka. In the event of a failover (due to a system failure or a scheduled maintenance), change events after the connector was paused will be lost.

Destroy

  • Deleting a table will tombstone the corresponding Kafka topic, but not destroy it.
  • If you deprovision a connector, we will not deprovision the associated Kafka topics. If you want to remove this data, then you must delete these topics yourself using the CLI: heroku kafka:topics:destroy <topic_name> --app <app_name>. Find more details in the documentation.

Keep reading

  • Databases & Data Management

Feedback

Log in to submit feedback.

PostgreSQL, libpq5.12.1, and Breaking Changes Impacting Connection Behavior Data Maintenance CLI Commands

Information & Support

  • Getting Started
  • Documentation
  • Changelog
  • Compliance Center
  • Training & Education
  • Blog
  • Podcasts
  • Support Channels
  • Status

Language Reference

  • Node.js
  • Ruby
  • Java
  • PHP
  • Python
  • Go
  • Scala
  • Clojure

Other Resources

  • Careers
  • Elements
  • Products
  • Pricing

Subscribe to our monthly newsletter

Your email address:

  • RSS
    • Dev Center Articles
    • Dev Center Changelog
    • Heroku Blog
    • Heroku News Blog
    • Heroku Engineering Blog
  • Heroku Podcasts
  • Twitter
    • Dev Center Articles
    • Dev Center Changelog
    • Heroku
    • Heroku Status
  • Facebook
  • Instagram
  • Github
  • LinkedIn
  • YouTube
Heroku is acompany

 © Salesforce.com

  • heroku.com
  • Terms of Service
  • Privacy
  • Cookies