Treasure Data Hadoop

This add-on is operated by Treasure Data

Cloud Data Service

Treasure Data Hadoop

Last Updated: 03 December 2013

addons beta

Table of Contents

Treasure Data Add-On lets Heroku users collect, store and analyze large amounts of data without sinking a lot of time learning about Hadoop, MPP databases and other powerful but time-consuming technologies. With Treasure Data, you can start analyzing your data today, not weeks and months later.

We are proven to scale. We store more than 2 trillion data records for our customers, adding hundreds of millions of data records every hour. Our customers run thousands of queries against their data on our system every day.

The Treasure Data Add-on offers instant setup of ‘**Log Everything**’ infrastructure for tracking, and understanding the user activities of your apps.

Our customers can collect user activities from their applications instantly and analyze the data through a SQL-like query language (Apache Hive) for better understanding of your users behavior. Typical use cases include:

  • Building Reporting Feature to Your Customers
  • Daily / Hourly Reports of Your Business Metrics
  • Ranking Calculation
  • Conversion Path Analytics

Provisioning the add-on

A list of all plans available can be found here.

One configuration parameter that Treasure Data introduces into your Heroku app is TREASURE_DATA_API_KEY.

Treasure Data can be attached to a Heroku application via the CLI:

$ heroku addons:add treasure-data
-----> Adding treasure-data to sharp-mountain-4005... done, v18 (free)

heroku addons:open treasure-data will lead you to our web console if you want.

Data Import: just write to STDOUT!

You can import data to Treasure Data by simply writing to STDOUT with a specific format. The format is:

@[database.table] JSON-In-ONE-LINE

Here’s an example in Ruby. The logs are uploaded every 5 minutes.

puts "@[production.login] #{{'uid'=>123}.to_json}"
puts "@[production.follow] #{{'uid'=>123, 'from'=>'@TreasureData', 'to'=>'@Heroku'}.to_json}"
puts "@[production.pay] #{{'uid'=>123, 'item_name'=>'Stone of Jordan', 'category'=>'ring', 'price'=>100, 'count'=>1}.to_json}"

If you can’t see your logs with “heroku logs” command, please check if your stdout is flushed correctly in your code.

CLI Setup

1) TD Toolbelt setup

At first, please download and install the Treasure Data Toolbelt for your development environment.

2) Heroku CLI setup

heroku-td CLI plugin is also required to bridge between heroku CLI and td CLI. Once you install the CLI plugin, you will be able to execute the heroku td family of commands.

$ heroku plugins:install https://github.com/treasure-data/heroku-td.git
$ heroku td
usage: heroku td [options] COMMAND [args]

## Analyze Your Dataset

Please access to your site to call `TD.event.post()`. After several minutes, the data upload is done by `td` gem. `heroku td tables` shows your uploaded dataset.

```term
$ heroku td tables
+------------+--------+------+-----------+
| Database   | Table  | Type | Count     |
+------------+--------+------+-----------+
| production | login  | log  | 31232     |
| production | follow | log  | 3132      |
| production | pay    | pay  | 132       |
+------------+--------+------+-----------+

Now you can issue the query into cloud by heroku td query command. The example query below counts login per day.

$ td query -w -d testdb \
  "SELECT \
     TD_TIME_FORMAT(time, "yyyy-MM-dd", "PDT") AS day, \
     COUNT(1) AS cnt \
   FROM login \
   GROUP BY TD_TIME_FORMAT(time, "yyyy-MM-dd", "PDT") \
   ORDER BY cnt"
+------------+------+
| day        | cnt  |
+------------+------+
| 2012-05-26 | 4981 |
| 2012-05-27 | 4481 |
| 2012-05-28 |  481 |
+------------+------+

Migrating between plans

Application owners should carefully manage the migration timing to ensure proper application function during the migration process.

Use the heroku addons:upgrade command to migrate to a new plan.

$ heroku addons:upgrade treasure-data:small
-----> Upgrading treasure-data:small to sharp-mountain-4005... done
       Your plan has been updated to: treasure-data:small

small is one of our plans. You can find out more about our price plans here.

Removing the add-on

Treasure Data can be removed via the CLI.

Note that your data stays on Treasure Data even if you remove the addon. If you want to purge your data, here is how you do it.

$ heroku addons:remove treasure-data
-----> Removing treasure-data from sharp-mountain-4005... done, v20 (free)

Support

All Treasure Data support and runtime issues should be submitted via on of the Heroku Support channels. Any non-support related issues or product feedback is welcome at support@treasure-data.com.

Additional resources

Customer Success Stories