Autofiles by line^d

Table of Contents [expand]

Provisioning the Add-on
Quickstart with Python
Setup S3 Client Dynamically (Recommended)
Setup Local Environment (Static)
Defining Workflows in JSON
Runtime Environment Variables
Managing Task Output and Status
Attach Workflows to Your Media and Data Files
Monitoring Using the Dashboard
Dyno Startup Latency
Removing the Add-on
Support

Last updated September 26, 2024

The Autofiles by line^d add-on is currently in beta.

This add-on is operated by line_D LLC

Simplify file storage and processing for logs, media, and more.

Autofiles by line^d provides advanced Amazon S3 cloud storage that lets you run dynos anytime you save files. Automate file processing for logs, media, documents, and more, without managing complex background systems.

Define your dyno workflows using JSON instructions within or attached to S3 files - we handle the rest. No other dependencies are required. With many other features like parallelism, retries, and execution guarantees, you can quickly and reliably automate background jobs.

Here are some workflows you can automate:

Image, video, and document processing: Automate file processing in the background by attaching JSON metadata to your S3 uploads.
Log management and monitoring: Filter, rotate, and summarize your system’s log files to create easy, low-cost monitoring workflows.
Data pipelines and ELT: Store data and run pipelines without leaving AWS S3.

Provisioning the Add-on

Attach line^d to a Heroku application via the CLI:

Reference the line^d Elements Page for a list of available plans and regions.

$ heroku addons:create line-d
Creating line-d-clear-41822 on example-app... free
line-D is being provisioned and will be available shortly.
line-d-clear-41822 is being created in the background. The app will restart when complete...

Quickstart with Python

or view the repo on Github

This Python application creates an example workflow with two steps.

# starter_workflow.py
import json
import os

import boto3

from utils import setup_s3_client


STARTER_WORKFLOW = {
    'title': 'Example 2-step workflow',
    'run': [
         {'cmd': """bash -c 'echo "Do Step 1"'""", 'machine': 'basic', 'no_output': True},
         {'cmd': """bash -c 'echo "Do Step 2"'""", 'machine': 'basic', 'no_output': True},
     ],
}


if __name__ == '__main__':
    s3 = setup_s3_client()

   # Upload the file to S3 and the workflow will start automatically
    s3.put_object(
        Body=json.dumps(STARTER_WORKFLOW).encode('utf-8'),
        Bucket=os.environ.get('LD_S3_BUCKET'),
        Key='in/example',
        Metadata={'ld_run': 'true'},
    )

You can start the workflow by running the script:

$ python starter_workflow.py

Setup S3 Client Dynamically (Recommended)

line^d uses temporary AWS credentials, which we rotate every ~6 hours for enhanced security. We recommend loading your Heroku config dynamically to get the latest credentials on your local machine at run time. Here’s an example using Python and the Heroku CLI:

import json
import os
import subprocess

import boto3


def setup_local_env():
    """Add Heroku Config variables to local environment"""
    heroku_config_out = subprocess.check_output('heroku config -j', shell=True)
    heroku_env = json.loads(heroku_config_out)
    for k, v in heroku_env.items():
        os.environ[k] = v


def setup_s3_client(local=False):
    if local:
        setup_local_env()

    aws = boto3.session.Session(
        aws_access_key_id=os.environ.get('LD_AWS_ACCESS_KEY_ID'),
        aws_secret_access_key=os.environ.get('LD_AWS_SECRET_ACCESS_KEY'),
        aws_session_token=os.environ.get('LD_AWS_SESSION_TOKEN'),
        region_name=os.environ.get('LD_REGION', 'us-east-1'),
    )
    return aws.client('s3')

Setup Local Environment (Static)

After provisioning, you can replicate your line_d config locally for development environments.

Use the local Heroku CLI command to configure, run, and manage process types specified in your app’s Procfile. Heroku Local reads configuration variables from a .env file. Use heroku config to view an app’s configuration variables in a terminal. Use the following command to add configuration variables to a local .env file:

$ heroku config:get LD_AWS_ACCESS_KEY_ID -s  >> .env
$ heroku config:get LD_AWS_SECRET_ACCESS_KEY -s  >> .env
$ heroku config:get LD_AWS_SESSION_TOKEN -s  >> .env
$ heroku config:get LD_REGION -s  >> .env

Don’t commit credentials and other sensitive configuration variables to source control. In Git exclude the .env file with: echo .env >> .gitignore.

For more information, see the Heroku Local article.

Defining Workflows in JSON

You can define workflows declaratively using JSON to support tasks written in any programming language. Here’s an example of a JSON file for a workflow.

{
    'title': 'Workflow title',
    'run': [
        {
            'name': 'First task to run',
            'cmd': 'Heroku process name',
            'machine': 'dyno_size',
            'options': ['dir=out', 'model_preset=model-preset']
        },
        {
            'name': 'Run a parallel Task Group',
            'parallel': [
                {'cmd': 'get_data', 'machine': 'basic', 'options': ['endpoint=posts']},
                {'cmd': 'get_data', 'machine': 'basic', 'options': ['endpoint=users']},
                {'cmd': 'get_data', 'machine': 'basic', 'options': ['endpoint=comments']},
            ]
        },
    ]
}

Workflow Format

Field Name	Type	Required	Description
`title`	string	Optional	Title of the workflow used for documentation and display
`run`	list of Tasks	Required	The sequence of Tasks you want to run

Task Format

Field Name	Type	Required	Description
`name`	string	Optional	The display name of the Task
`cmd`	string	Required	Heroku process to run that is defined in the Procfile
`machine`	string	Optional	Heroku Dyno size (default = ‘basic’)
`timeout`	integer	Optional	Seconds to wait for output before timing out (default = 60)
`options`	list of strings	Optional	Option name, value pairs to include in the Task environment vars - example: ‘option_name=option_value’
`no_output`	boolean	Optional	Skip waiting for an output file. If the Task dyno starts up, it’s considered successful. (default = False)
`wait_for`	integer	Optional	Seconds to wait after the Task completes (default = 0)

Task Group Format

Field Name	Type	Required	Description
`name`	string	Optional	The display name of the Task Group
`parallel`	list of Tasks	Required	Tasks you want to run in parallel (see format above)

Runtime Environment Variables

line^d provides a set of helpful environment variables for each dyno at run time. Use the variable line_D to access them in JSON format.

Managing Task Output and Status

By default, line^d expects each Task to send an output file as proof that it succeeded. You can disable this behavior by specifying the no_output option. Here’s an example showing how to include the task_id with your output file:

import json
import os

import boto3
import requests

from utils import setup_s3_client

def get_json():
    ld_env = json.loads(os.environ['line_D'])
    task_id = ld_env['task_id']
    url = 'https://jsonplaceholder.typicode.com/posts'
    resp = requests.get(url)
    records = resp.json() if resp.status_code == 200 else []

    # You have the option to store state to help with monitoring and composing workflows
    state = {
        'resp_status_code': resp.status_code,
        'num_records': len(records),
        'resp_time': str(resp.elapsed),
    }
    s3 = setup_s3_client()
    s3_client.put_object(
        Body=json.dumps(records).encode('utf-8'),
        Bucket=os.environ.get('LD_S3_BUCKET'),
        Key=f'out/{task_id}',
        Metadata={
            'ld_task_id': task_id,
            'ld_state': state,
    )

Attach Workflows to Your Media and Data Files

It can be helpful to upload files with a post-processing workflow attached in one S3 request. This pattern can often eliminate the need for a separate background job system. Here’s an example of this pattern.

import json
import os

import boto3

from utils import setup_s3_client

PROCESS_MEDIA = {
    'title': 'Process media files',
    'run': [
         {'cmd': 'process_image', 'machine': 'basic', 'timeout': 30},
     ],
}


if __name__ == '__main__':
    s3 = setup_s3_client()
    filename = 'example_img.jpg'
    example_file = open(filename, 'wb').close()
    with open(filename, 'rb') as media_f:
       # Use the S3 Metadata key "ld_code" to attach the workflow
        s3.put_object(
            Body=media_f,
            Bucket=os.environ.get('LD_S3_BUCKET'),
            Key=f'in/img/{filename}',
            Metadata={'ld_code': json.dumps(PROCESS_MEDIA)},
        )

Monitoring Using the Dashboard

The line^d dashboard allows you to monitor your system activity. You can access the dashboard via the CLI:

$ heroku addons:open line-d
Opening line-d for sharp-mountain-4005

or by visiting the Heroku Dashboard and selecting the application in question. Select line^d from the Add-ons menu.

Dyno Startup Latency

The average latency between writing a file to S3 and dyno startup is ~7 seconds. The total latency involves two main components:

S3 event latency (~2 sec): time for S3 to deliver an event to the line^d compute engine
Dyno start-up time (~5 sec): time for Heroku dynos to reach the up state after requesting it

Remember that latency can vary for many reasons, like network traffic, AWS or Heroku outages, etc. In those rare scenarios, latency at the 99 percentile could be many times greater than 7 seconds.

Removing the Add-on

Before removing line^d, it’s recommended to export your data from S3. Here is how to do that using the AWS CLI and the sync command:

$ aws s3 sync s3://$LD_S3_BUCKET ./s3_export
download: s3://ld-private-cac9ba5f-8e42-436a-ac1d-e74708b0fddd/etl/users.json to s3_export/users.json
download: s3://ld-private-cac9ba5f-8e42-436a-ac1d-e74708b0fddd/etl/albums.json to s3_export/albums.json
download: s3://ld-private-cac9ba5f-8e42-436a-ac1d-e74708b0fddd/etl/posts.json to s3_export/posts.json

Then remove the add-on using the Heroku CLI.

This action destroys all associated data and you can’t undo it!

$ heroku addons:destroy line-d
-----> Removing line-d-clear-41822 from example-app... done, v20 (free)

Support

Submit all line^d support and runtime issues via one of the Heroku Support channels. Any non-support-related issues or product feedback is welcome at support@line-d.net.

Categories

Table of Contents [expand]

This add-on is operated by line_D LLC

Provisioning the Add-on

Quickstart with Python

Setup S3 Client Dynamically (Recommended)

Setup Local Environment (Static)

Defining Workflows in JSON

Workflow Format

Task Format

Task Group Format

Runtime Environment Variables

Managing Task Output and Status

Attach Workflows to Your Media and Data Files

Monitoring Using the Dashboard

Dyno Startup Latency

Removing the Add-on

Support

Feedback