This add-on is operated by Xplenty
Data Integration Service. ETL and ELT. Simplified.
Last updated 09 October 2019
Table of Contents
Xplenty is a data integration service that harnesses the power of Hadoop to provide an easy-to-use, scalable ETL service that makes it a snap to prepare your structured and semi-structured data for analytics.
With Xplenty, you can process data generated from Heroku apps as well as any other data store. Heroku runtime metrics can also be processed and analysed on Xplenty in conjunction with using log services add-ons. Xplenty can join data from multiple sources, including Heroku Postgres, MySQL and MongoDB, and provides a wide range of out-of-the-box data transformation tasks, from simple sorting and aggregation to sophisticated functions and data manipulations.
When finished, place your transformed data in analytics stores such as AWS Redshift, SAP HANA, or back in your favorite relational database, cloud storage or NoSQL data store.
Provisioning the add-on
Xplenty can be attached to a Heroku application via the CLI:
$ heroku addons:create xplenty -----> Adding xplenty to sharp-mountain-4005... done, v18 (free)
A list of all plans available can be found here.
Once the Xplenty add-on is added the
XPLENTY_ACCOUNT_ID config variables become available in the app configuration.
Use the Heroku config command to view your app’s config variables.
$ heroku config | grep XPLENTY XPLENTY_API_KEY => 5Yyps5gGteftuTRCjF1u XPLENTY_ACCOUNT_ID => myaccount
Xplenty is accessible via a REST API and has easy integration options for many languages and frameworks. Follow Xplenty API specification for further documentation.
Xplenty doesn’t need to integrate at a code-level and thus isn’t tied to a particular language. It can be used with any app hosted on Heroku, be it Ruby, Node.js, Clojure, Java, Python, or Scala, as long as it makes use of a PostgreSQL, MySQL database or NoSQL data store. To make it peace of cake to start processing your own data, we provide an easy way to import your Heroku applications data store connections so you won’t need to do it manually
You can directly import Heroku connections without having to create and define the connections in Xplenty. Just invoke Xplenty’s Import from Heroku option and select the application connections from a list of Heroku connections.
If you have not previously authorized Xplenty access to your Heroku applications, you will receive this prompt to authorize the access. (Note that you can revoke that authorization from the Heroku application at any time).
Once you’ve completed the previous section you’re all set to use the Xplenty interface.
The interface can be accessed via the CLI.
$ heroku addons:open xplenty
or by visiting the Heroku Dashboard and selecting the application in question. Select Xplenty from the Add-ons menu.
Understanding Xplenty terminology
Xplenty is all about making data processing easy and scalable. Before beginning to use Xplenty, here’s a brief overview of the basic terms used on the platform
An Xplenty cluster is a Hadoop cluster - a group of machines (nodes) that is allocated exclusively to your account’s users. You can create one or more clusters, and you can run one or more jobs on each cluster. A cluster that you’ve created remains allocated to your account until you request to terminate the cluster.
Packages define the data flow using components that specify the data to process, the data manipulation to perform, and the output destinations. Once you define a package, you can verify it, and, as in any development lifecycle, fix any errors and re-verify until the package is ready to run as a job on a cluster.
An Xplenty job is a process that is responsible for running a specific package on a Hadoop cluster. The job is a batch process that processes a finite amount of data and then terminates. Several jobs can run the same package simultaneously. When you run a new job, you select the name of the package whose workflow the job should perform, and the cluster on which to run.
Creating and executing a data flow with Xplenty
In order to start processing data with Xplenty, from the management portal, head to the connections section, located under your account settings, and define connections that will contain access information required to connect to your various data stores. The access information is stored securely and can only be used by your account’s members.
You can define cloud storage connections, to connects to your AWS S3 storage, Rackspace CloudFiles or IBM Softlayer Obejct Storage, or you can define connections to connect with your cloud database: Heroku Postgres, MongoDB, MySQL, AWS Redshift and many more.
Once connections are defined, head to the packages section, and create a package. Here you’ll define your data flow, where you can use a large set of out-of-the-box transformations to apply your business logic to your data.
To execute the package, go to the clusters section, and create a new cluster which will serve as the processing engine for your jobs. Once a cluster is created, you’ll be able to execute a package on it.
Migrating between plans
heroku addons:upgrade command to migrate to a new plan.
$ heroku addons:upgrade xplenty:newplan -----> Upgrading xplenty:free to sharp-mountain-4005... done Your plan has been updated to: xplenty:basic
Removing the add-on
Xplenty can be removed via the CLI.
This will destroy all associated data and cannot be undone!
$ heroku addons:destroy xplenty -----> Removing xplenty from sharp-mountain-4005... done, v20 (free)