Writing Data to Salesforce with Heroku Connect
Last updated 10 May 2018
Table of Contents
Connect will capture changes made in Heroku Postgres and send them to Salesforce using either the SOAP or Bulk API. For every table that is marked read-write, Connect adds a trigger which captures the update into an ordered table, the trigger log. This table is shared between all read-write data tables for a connection and Connect will process all the captured changes in a single process per connection. This process runs independently and concurrently with reads from Salesforce for mapped objects.
Every time a writer other than Connect performs an INSERT, all the columns are captured. After a successful INSERT into Salesforce, Connect will write the new Salesforce ID (sfid) back into the record in the DB. Any time a writer other than Connect performs an UPDATE the trigger compares the values in the row before and after the change and records only the changed values. A DELETE operation records just the delete. Connect is not able to observe change within transactional bounds and thus cannot enforce a strict ordering of writes as they occurred in the DB.
How Connect Writes to Salesforce
Connect currently supports two algorithms when writing to Salesforce: Merged Writes and Ordered Writes, which is the default option. The Merged Writes algorithm reorders changes from the trigger log in an attempt to maximize the number of records in each SOAP message. The Ordered Writes algorithm operates differently in that it always preserves the order of the changes from the trigger log, even if this results in less efficient packing of records in a given message. The Ordered Writes algorithm has the advantage of treating each captured operation as a direct instruction of a change to send. Additionally, the Ordered Writes algorithm is a prerequisite for bulk writes to Salesforce.
See Potential Pitfalls for the Merged Writes algorithm and Potential Pitfalls for the Ordered Writes algorithm for more information on the differences between these algorithms.
Polling for Changes
Heroku Connect will poll your database for updates every two minutes, unless it detects an update from a
pg_notify trigger. These accelerated polls are limited to at most every 10 secconds. When Connect polls the database and finds changes it will begin to write changes back to Salesforce. These intervals are not configurable. Once a set of writes complete, a new poll cycle will start and wait either two minutes or for a new change event from
Heroku Connect will not poll for additional database updates while a write operation to Salesforce is underway. Additionally, write operations are executed in serial across all mappings in order. There is no mechanism to prioritize a set of writes over any other set.
SOAP vs Bulk API
When bi-directional sync is configured, Connect will write changes back to Salesforce using either the SOAP or Bulk API. The Bulk API is optimized for operations that apply to larger datasets and is therefore significantly faster than using SOAP in such cases. Connect will automatically attempt to use the Bulk API when all the following conditions is met:
- The connection is configured to use the Ordered Writes algorithm
- The API version is set to v39 or higher
- Two thousand or more contiguous changes of the same type are made to a single object (e.g., 5K INSERTS into ‘Lead’ object)
If you plan to write objects to your database, we strongly recommend adding a unique identifier to your object and configuring it as the unique identifier in the mapping editor. Because separate threads are utilized for reading and writing, there exists a possibility that a reader may process a write operation from Salesforce to Postgres before a write process from Postgres to Salesforce has finished writing the Salesforce ID for those same records. When this setting is configured, Heroku Connect matches records coming from Salesforce first by Salesforce ID (if available), and then by the specified Unique External ID. This ensures that no matter which process attempts to write the record first, there are no Integrity Errors and the record is always matched. Additionally, you can also reference the newly inserted row from a foreign table by this external ID.
You should populate this field with a unique value, such as a GUID when you first INSERT the record. This can be used to establish relationships between objects, and also helps ensure against creating duplicate records. It’s important not to change the value of that ID after writing it to a record.
The unique identifier should be treated as an alias for a Salesforce ID. Therefore, they should not change over time, never be reused, and you should choose a similarly unique mechanism for generating them.
We recommend using random UUIDs, which can be generated using
Postgres integer sequences are not recommended as they are not guaranteed to be unique. Further, the ID field in the Heroku Connect database should not be used since these numbers can reset when importing your configuration into a new database or on reloading a table.
Potential Pitfalls for the Merged Writes Algorithm
The Merged Writes algorithm attempts to reorder updates so that they resolve dependencies for relationships but this algorithm may not be able to pick a correct order in all cases. In particular circular dependencies are known to cause issues. Additionally merged changes can make it impossible to reliably establish some relationship types. Apex triggers and process rules in Salesforce will not be guaranteed to see all changes made in the database as some sets of changes will be merged by this algorithm.
The Merged Writes algorithm will attempt to re-send failed inserts on a subsequent update simplifying some error recovery cases.
Potential Pitfalls for the Ordered Writes Algorithm
Rapid updates to the same record may result in slower sync speeds using the Ordered Writes algorithm as each individual change captured will be sent to Salesforce independently. In many cases this is balanced by lessened per-change processing overhead.
While Heroku Connect attempts to pack each message sent to Salesforce as densely as possible, it must follow the writing rules of the Salesforce SOAP API which limits the number of chunks of changes that can be packed in one message. Each chunk is a transition between a set of changes for a Salesforce object. Writing to many objects in quick succession or alternating back and forth between objects could significantly reduce the number of changes that can be put in a single message. Many simultaneous database writers could also make this worse.
A failed INSERT will cause all subsequent updates to fail until corrective action is taken to resubmit that record. See Resolving Write Errors.
Simple Relationships using Salesforce IDs
Generally speaking a simple relationship can be created in two steps. The first step is inserting the parent record. The second step is inserting the child record with a reference to the parent. Salesforce native relationship fields use 18 character unique record IDs from the foreign record. A relationship can always be created from your Postgres database when you have the IDs available. The 18 character IDs are available after a record has been inserted in the
sfid field. You can establish a relationship between existing records using an INSERT or UPDATE statement when the
sfid for the parent record is not null.
As an example, imagine you’re working with the standard Salesforce objects Account and Contact. Contact has a predefined reference to Account through the
AccountId field. You insert a record into Account as follows:
INSERT INTO salesforce.account (name, description) VALUES ('My new account', 'This is a very important account.')
Now that Account record (with id 1234) has propagated to Salesforce and Connect has written back the
sfid field with the record’s assigned Salesforce ID. The relationship can be established directly using the 18 character
sfid value and the
INSERT INTO salesforce.contact (firstname, lastname, accountid) SELECT 'John', 'Smith', sfid FROM salesforce.account where id=1234;`
While this method works in most cases, it’s very inconvenient since there are many cases where you would like to insert both the Account and the Contact at the same time. The next section discusses how to do this.
Simple Relationships Between Two Objects and Relationship External IDs
While using Salesforce IDs works in practice, it makes things overly complicated when using Heroku Connect or the Salesforce API. The ideal approach is to set-up both insertions without having to wait for the Salesforce ID from the first insertion before making the second. Salesforce solves this problem by allowing the child record insertion to refer to the parent record via an External ID that is inserted with the parent record. Heroku Connect allows you to map an extra relationship field on the child record which is named as a combination of Salesforce ID reference/lookup field and foreign External ID you will be using. If you have multiple External IDs on the parent record you will want to use the one that that you have configured as the unique identifier (See Ensuring Uniqueness above).
Building from the example in the last section, let’s now assume that you’ve configured an External ID field called
External_ID__c on the Account object. As before, Contact has a standard reference to Account through the
AccountId field. On the Contact mapping create/edit page you will be offered to add the field
Account__External_ID__c which is the relationship field you can use to populate this reference in two steps.
INSERT INTO salesforce.account (name, description, external_id__c) VALUES ('My new account', 'This is a very important account.', generate_uuid_v4());
Assuming the primary key
id was captured from the previous insert, the Contact record can be inserted as below:
INSERT INTO salesforce.contact (firstname, lastname, account__external_id__c) SELECT 'John', 'Smith', external_id__c FROM salesforce.account where id=1234;
This relationship will now be accurately populated in Salesforce once Heroku Connect processes the generated trigger log entries.
Circular References Between Two Objects
This is not possible to accomplish reliably using the Merged Writes algorithm and relationship external ID fields. It is possible to use 18 character Salesforce IDs directly as described in Simple Relationships using Salesforce IDs above.
Let’s take our example another step further and define a reference from Account to the “Primary Contact” for the account. You would next add a field in Salesforce that points to that particular Contact. Let’s call this field
Primary_Contact__c and add another External ID field to Contact called
Ext_ID__c. The Account Mapping edit page would then allow you to map the relationship field
Primary_Contact__r__Ext_ID__c. This can be used to set up the cyclical relationship in an additional step. Assume in the previous section that a value from
generate_uuid_v4() was also set into the
ext_id__c field in Contact during that INSERT statement and a value of 4321 was captured for
id. The following SQL snippet would set-up the cyclical relationship:
UPDATE salesforce.account set primary_contact__r__ext_id__c = c.ext_id__c FROM salesforce.contact AS c where c.id = 4321 and salesforce.account.id = 1234;
This is not possible to accomplish reliably using the Merged Writes algorithm and relationship external ID fields. It is possible to use 18 character Salesforce IDs directly as described in Simple Relationships using Salesforce IDs above.
There is yet one more type of reference that can be established. This is a special case of circular reference where the reference goes to another record in the same object. In this case there is an extra limitation where the self-reference must be done in a subsequent update rather than in the original insert. This is because Salesforce cannot insert a child record referencing a parent from the same object in one SOAP CREATE message. Account can be used for this example again because Account has a field called ParentID which refers back to Account. Editing the Account mapping again shows that Heroku Connect offers to let you map the Parent_External_ID_c field which can then be used to establish the relationship between a child Account record and its parent. Two Account records can be created using inserts as follows:
INSERT INTO salesforce.account (name, description, external_id__c) VALUES ('My new account', 'A very important account', generate_uuid_v4());
INSERT INTO salesforce.account (name, description, external_id__c) VALUES ('Child of My new account', 'Sub account of 1234', generate_uuid_v4());
Since the ID value from the first and second INSERT statements were captured, the relationship can now be established as shown below:
UPDATE salesforce.account AS c set parent__external_id__c = p.external_id__c FROM salesforce.account AS p WHERE p.id = 1234 AND c.id = 1235;
Note that when working with a self-referential relationship, the relationship must be established using a subsequent update. Doing this will ensure the child and parent cannot be in the same insert batch, thereby satisfying the requirements of the Salesforce API.
A polymorphic relationship is a relationship where the referenced objects can be one of several different object types. For example, an Event object can be related to an Account, or a Campaign, or an Opportunity through the 'What’ polymorphic relationship. Two pieces of data are required to allow Salesforce to identify the related object in a polymorphic relationship: the object ID and the object type. In order to create a polymorphic relationship with Heroku Connect, you need to use the 18 character Salesforce IDs as described in Simple Relationships using Salesforce IDs. Heroku Connect does not support mapping external IDs for polymorphic relationships as the functionality to allow both the object ID and object type to be specified for these relationships is not currently available.
External ID relationships for What and Who polymorphic relationship fields will not be available when creating and editing your mappings.
Understanding the Trigger Log
When changes are made to records, these changes are captured in the Trigger Log. Changes captured at the time that a mapping is in read-write mode will be scheduled to be sent to Salesforce. Changes that are captured at the time that a mapping is in read-only mode will be captured but not acted upon. Note that changing a mapping from read-write to read-only does not prevent already captured changes from being sent to Salesforce.
The Trigger Log is comprised of two tables in the schema configured for your connection. New changes go directly into the
_trigger_log table and records that have been processed from the
_trigger_log table will get moved to the
_trigger_log_archive. The length of time that changes stay in the
_trigger_log_archive table depends on the plan type of the connection. Demo plan connections keep changes in the archive for 7 days while paid plan connections keep changes in the archive for 30 days. Changes older than 30 days are automatically purged by Heroku Connect.
_trigger_log_archive tables have the same structure.
|id||integer||Table Primary Key. Also used to determine order of changes.|
|created_at||timestamp||Time change captured|
|updated_at||timestamp||Changed when entry is modified by Heroku Connect|
|processed_at||timestamp||Time when Heroku Connect finished working on the entry|
|table_name||varchar(128)||Table the change was captured from.|
|sfid||varchar(18)||Salesforce ID of the captured change (if applicable).|
|action||varchar(7)||The operation captured: INSERT, UPDATE or DELETE|
|state||varchar(8)||The current state of the record (see States below)|
|values||text||hstore encoded column values captured for the change.|
|sf_message||text||Any error messages returned from Salesforce when an error occurs during writing. (See Write Errors below).|
|old||text||The old value captured from before the change encoded in hstore format.|
The trigger log tracks state of each entry in the
state field. Each entry may end in one of these states: SUCCESS, MERGED, IGNORED, FAILED or READONLY. Additionally, the entry may go through the states NEW, PENDING and IGNORE.
- SUCCESS means the row was successfully written to Salesforce.
- MERGED means that several changes were made to a single row within a single polling interval. When this happens, all of the changes will be merged into a single API request to Salesforce. This only applies to the Merged Writes algorithm.
- IGNORED means that Connect did not attempt to write the record to Salesforce. This can happen if you update a record but don’t actually change any data that needs to get synced back to Salesforce.
FAILED means that Connect tried to write the record, but failed. The
sf_messagecolumn should contain an explanation for why it failed.
- READONLY indicates a captured change from a read-only table
- NEW indicates a newly captured change ready for processing
- IGNORE indicates a newly captured change that has not detected a change that needs to get synced back to Salesforce
- PENDING indicates a change that Connect is currently processing
Writes to Salesforce can fail for any number of reasons, including planned maintenance windows. In cases where Salesforce is unavailable, Connect will queue writes until Salesforce is available again. In all other cases, user intervention will be required to resolve errors. You can find details about write errors by looking at the record in the Explorer or the
_trigger_log table in your database by using the following query run via
SELECT table_name, record_id, action, sf_message FROM salesforce._trigger_log WHERE state = 'FAILED';
A common error that indicates the integration user doesn’t have write permission to the object or field in Salesforce.
duplicate value found: Contact_External_ID__c duplicates value on record with id: a006…
This error indicates that Connect tried to push a record to Salesforce, but Salesforce wasn’t able to save it. In this case, Connect was trying to insert an object with a
Contact_External_ID__c that already exists and is supposed to be unique.
In some cases, this can be caused by a race condition in Heroku Connect itself. If you are inserting records into the database, it’s a good idea to add and populate an External ID field for each record.
entity is deleted
This means that Connect attempted to sync a record to Salesforce only to be told by Salesforce that the record has been deleted on the Salesforce side.
owner is inactive, cannot reparent record
This happens when Connect attempts to sync a record, with a relational field pointing to another object, where the parent object (to which the child object in question is pointing) is no longer active.
The Salesforce API documentation includes a full list of error codes and their explanations.
Resolving Write Errors
Once you have investigated the cause of the error, and are ready to have Connect reprocess the update you will need to inform Connect. How you do so depends on which write algorithm you are using.
Merged Writes Algorithm
You can signal Connect to reprocess updates by resetting the
state field to
NEW. Alternatively, if an INSERT failed, you can simply UPDATE a column in the record that failed to write and a new trigger log entry will be created and sent to Salesforce as an INSERT.
For example to retry all failed entries for a mapping named
Contact use the following query:
UPDATE salesforce._trigger_log SET state = 'NEW' WHERE state = 'FAILED' AND table_name = 'contact';
The above example uses the default Postgres schema name of
salesforce. You will need to change this if you are using a different schema. The
table_name field in the
_trigger_log table does not include the schema name.
Trigger log entries older than 24 hours are moved to a
_trigger_log_archive table. If you need to resubmit entries that have been moved to the archive, you will need to copy them back to the
-- Set the records you want to retry to a state of 'REQUEUE' UPDATE salesforce._trigger_log_archive SET state='REQUEUE' WHERE state='NEW'; -- or some other criteria like state='FAILED' and table_name = 'account' and sf_message like '%my error message%' -- Copy them to the trigger log table INSERT INTO salesforce._trigger_log (created_at, updated_at, state, txid, table_name, action, record_id, values, old, sfid) SELECT now(), now(), 'NEW', txid, table_name, action, record_id, values, old, sfid FROM salesforce._trigger_log_archive where state='REQUEUE'; -- Update the records still in the archive UPDATE salesforce._trigger_log_archive SET state='REQUEUED' WHERE state='REQUEUE';
It may take up to 20 minutes before Connect will find changes that have been reset into the
Ordered Writes Algorithm
Sometimes INSERTs or UPDATEs that are valid in Postgresql are not at all valid when being sent to Salesforce. Salesforce can enforce additional rules around referential integrity or data format. Some changes might even fail because of trigger based validations or locking issues. Whatever the reason you will want to review the status of the changes being sent to Salesforce regularly. You can do this by inspecting the trigger log table:
SELECT created_at, table_name, db_record_id, sfid, sf_message FROM salesforce._trigger_log WHERE state='FAILED';
If you find that you have errors you have several options for correcting the issue. Simple situations can be handled with an update to correct the issue. More complicated changes may require the use of one of the repair functions available in your database.
If the failure was for an update. You may just need to correct a value:
UPDATE salesforce.account set name='Smaller Name' where id = 1234;
If the failure was for an insert, you’ll need to use the insert repair function:
hc_capture_insert_from_row( source_row hstore, table_name varchar, excluded_cols text = ARRAY::text )
excluded_cols parameter is optional. If it is not set then the function will not exclude any fields and therefore the function will capture all fields.
If the failure was for an update, you’ll need to use the update repair function:
hc_capture_update_from_row( source_row hstore, table_name varchar, columns_to_include text = ARRAY::text )
columns_to_include parameter is optional. If it is not set, the function will include all fields.
In the simple case you can just resubmit the record:
SELECT salesforce.hc_capture_insert_from_row(hstore(salesforce.account.*), 'account') FROM salesforce.account WHERE id = 1234;
In a case where some part of the record cannot go with record create (for instance for one of the relationship cases above) you can omit part of the record using the excluded_cols option:
SELECT salesforce.hc_capture_insert_from_row( hstore(salesforce.account.*), 'account', ARRAY['parent__external_id__c']) FROM salesforce.account where id = 1234;
You may then need to re-queue an update change:
SELECT salesforce.hc_capture_update_from_row( hstore(salesforce.account.*), 'account', ARRAY['parent__external_id__c']) FROM salesforce.account where id = 1234;
Using Custom Triggers with Heroku Connect
Heroku Connect uses Postgres triggers to observe changes in your Connect tables and write your changes to Salesforce. If you implement your own custom triggers on Connect tables, Connect may miss your updates, or you may create additional load.
In its own triggers, Connect makes use of a variable in Postgres called
xmlbinary to determine whether or not to send updates to Salesforce. This is how Connect prevents its own updates from being sent back to Salesforce and creating an infinite loop. It is possible to set this value in your custom trigger to temporarily enable or disable sending updates to Salesforce. Please proceed at your own risk, and know that Support may not be able to provide help with custom triggers.
A value of
base64 means that row updates will be detected by the Connect triggers and sent to Salesforce. A value of
hex means that updates to database rows will be ignored by the Connect triggers.
Making Changes in Response to Data Coming from Salesforce or To Foreign Tables
When reacting to changes coming from Salesforce as written by Heroku Connect you will want to use an
AFTER trigger and temporarily change the value of
base64 and restore the previous value after you have made your changes. This also works well if you are altering data in another table that requires capturing.
Here’s an example for adding a value to an
externalid__c field in a
Contact mapping whenever a new record is inserted into the
contact table from Salesforce. This trigger ensures that if a record is created in Salesforce and written to your database with Connect a unique value will be written to the row in your database as part of a sync operation and the
externalid__c update will be detected, and the value will sync back to Salesforce:
CREATE OR REPLACE FUNCTION salesforce.contact_external_id_proc() RETURNS TRIGGER AS $$ DECLARE oldxmlbinary varchar; BEGIN -- Save old value oldxmlbinary := get_xmlbinary(); -- Change value base64 to ensure writing to _trigger_log is enabled SET LOCAL xmlbinary TO 'base64'; -- This is where you start your custom trigger code. -- Update the external ID UPDATE salesforce.contact SET externalid__c = generate_uuid_v4() WHERE id = NEW.id; -- This is where you end your custom trigger code. -- Reset the value EXECUTE 'SET LOCAL xmlbinary TO ' || oldxmlbinary; RETURN NEW; END; $$ LANGUAGE plpgsql; DROP TRIGGER IF EXISTS contact_after_trigger ON salesforce.contact; CREATE TRIGGER contact_after_trigger AFTER INSERT OR UPDATE ON salesforce.contact FOR EACH ROW WHEN (get_xmlbinary()::text = 'hex'::text AND NEW.externalid__c IS NULL) EXECUTE PROCEDURE salesforce.contact_external_id_proc();
Connect often INSERTs, UPDATEs or DELETEs thousands of records at a time. You should ensure that any changes you write in a trigger responding changes from Salesforce perform as you expect with this volume of records and avoid writing many changes in response to one record change. Additionally you should avoid doing complex calculations in triggers as these can have large adverse impact on the speed of writing data from Salesforce.
Altering the Result of Your own Data Change Statements
A special case of making triggers work with Connect is when you would like to alter the data captured by Connect in response to your own SQL statements. For instance you may want to ensure that a UUID External ID is inserted every time you write a new record to your database. In this case you would want to use a
BEFORE trigger so that you can ensure that the change is picked up by Connect with the original INSERT statement. Continuing the Contact example from the last section you would have a trigger that looks like:
CREATE OR REPLACE FUNCTION salesforce.contact_external_id__before_proc() RETURNS TRIGGER AS $$ BEGIN -- This is where you start your custom trigger code. -- Update the external ID NEW.external_id__c := generate_uuid_v4(); RETURN NEW; END; $$ LANGUAGE plpgsql; DROP TRIGGER IF EXISTS contact_before_trigger ON salesforce.contact; CREATE TRIGGER contact_before_trigger BEFORE INSERT OR UPDATE ON salesforce.contact FOR EACH ROW WHEN (get_xmlbinary()::text = 'base64'::text AND NEW.externalid__c IS NULL) EXECUTE PROCEDURE salesforce.contact_external_id_before_proc();
Note that in this case the WHEN condition guarantees that
xmlbinary is already set to
base64 and it is not necessary to set or unset its value to ensure that Connect captures the change with the rest of the record.