Managed Inference and Agents Add-on

Last updated June 18, 2025

Tools
Benefits
Available Models
Install the CLI Plugin
Provision Access to an AI Model Resource
Language-Specific Examples
Call an AI Model Resource
Monitoring and Logging
Deprovisioning an AI Model Resource

The Heroku Managed Inference and Agents add-on may employ third-party generative AI models to provide the Service. Due to the nature of generative AI, the output that it generates may be unpredictable, and may include inaccurate or harmful responses. Customer assumes all responsibility for such output, including ensuring its accuracy, safety, and compliance with applicable laws and third-party acceptable use policies. For more information, please see the Heroku Notices and License Information Documentation.

The Heroku Managed Inference and Agents add-on offers an easy way to access various large foundational AI models, including supported language (chat), embedding, and diffusion (image) models.

To use these models, attach one or more model resources from the Heroku Managed Inference and Agents add-on to your Heroku app. The add-on adds config variables to your app, allowing you to call the provisioned models. You can call models with the Heroku AI CLI plug-in or via direct curl requests.

All available models are hosted on Amazon Bedrock. Heroku provides an API similar to OpenAI’s to access the models.

Check out the Python, Ruby, and JavaScript (Node.js) quick start guides.

Tools

This add-on can also enable your large language models (LLMs) to run tools on Heroku automatically, with built-in support for retries and error correction. It supports both custom tools you create and built-in tools, like code execution, provided by Heroku.

When enabled, your app’s LLM calls a tool that triggers Heroku’s control loop to provision, execute, and deprovision dynos in the background, with action traces included in the model’s output.

See Heroku Tools and Working With MCP to learn more.

Benefits

Heroku owns and maintains the add-on. Your data is never sent outside of secure AWS accounts. Inference prompts and completions are only logged temporarily via Heroku Logplex, which you control.

Available Models

The following models are available.

Region: `us`

Model Documentation	Type	API Endpoint	Model Source	Description
claude-4-sonnet	`text → text`	v1/chat/completions	Anthropic	The newest Claude Sonnet model supporting chat, tool-calling, and extended thinking.
claude-3-7-sonnet	`text → text`	v1/chat/completions	Anthropic	A state-of-the-art LLM that supports chat, tool-calling, and extended thinking.
claude-3-5-sonnet-latest	`text → text`	v1/chat/completions	Anthropic	A state-of-the-art LLM that supports chat and tool-calling.
claude-3-5-haiku	`text → text`	v1/chat/completions	Anthropic	A faster, more affordable LLM that supports chat and tool-calling.
cohere-embed-multilingual	`text → embedding`	v1/embeddings	Cohere	A state-of-the-art embedding model that supports multiple languages. This model is helpful for developing Retrieval Augmented Generation (RAG) search.
stable-image-ultra	`text → image`	v1/images/generations	Stability AI	A state-of-the-art diffusion (image generation) model.

Region: `eu`

Model Documentation	Type	API Endpoint	Model Source	Description
claude-4-sonnet	`text → text`	v1/chat/completions	Anthropic	The newest Claude Sonnet model supporting chat, tool-calling, and extended thinking.
claude-3-7-sonnet	`text → text`	v1/chat/completions	Anthropic	A state-of-the-art LLM that supports chat, tool-calling, and extended thinking.
claude-3-haiku	`text → text`	v1/chat/completions	Anthropic	A faster, more affordable LLM that supports chat and tool-calling.
cohere-embed-multilingual	`text → embedding`	v1/embeddings	Cohere	A state-of-the-art embedding model that supports multiple languages. This model is helpful for developing Retrieval Augmented Generation (RAG) search.

Install the CLI Plugin

Heroku provides an AI CLI plugin to interact with your model resources.

Install the Heroku CLI if you haven’t installed it yet. Then, install the Heroku AI plugin:

heroku plugins:install @heroku/plugin-ai

See Heroku AI CLI Plugin Command Reference for details of all plugin commands.

Provision Access to an AI Model Resource

To use a model, you must first create and attach a model resource $MODEL_ID to your app $APP_NAME.

If you don’t have an app, you can create one with heroku create <your-new-app-name>.

To view the available models, you can run heroku ai:models:list. After deciding which model you want to use, run:

heroku ai:models:create -a $APP_NAME $MODEL_ID

By default, apps in the us region can only provision us models, and apps in the eu region can only provision eu models. Similarly, private space apps in the oregon, virginia, or montreal regions by default can only provision us models; apps in the remaining private space regions can only provision eu models. To override this, use the --region flag and use addons:create instead of ai:models:create to create your model resource: heroku addons:create heroku-inference:$MODEL_ID -a $APP_NAME -- --region=us/eu

You can attach multiple model resources to a single app. By default, the first model resource you attach to an app has an alias of INFERENCE. Subsequent attachments have randomized alias names, so we recommend you specify an alias with the --as flag. Specifically, we recommend using --as values of EMBEDDING and DIFFUSION for our embedding and diffusion models:

heroku ai:models:create -a $APP_NAME cohere-embed-multilingual --as EMBEDDING
heroku ai:models:create -a $APP_NAME stable-image-ultra --as DIFFUSION

We recommend using an alias of INFERENCE for the chat models, an alias of EMBEDDING for the embedding model (cohere-embed-multilingual), and DIFFUSION for the image model (stable-image-ultra) that we offer. Our example code follows this pattern, so for easy copy and pasting of commands, we recommend you also use these aliases.

If you attach more than one model resource of the same type to a single app, you must specify your own alias and replace any example code you’re using with the resulting config vars.

Model Resource Config Vars

After attaching a model resource to your app, your app has three new config variables. You can view these variables by calling heroku config -a $APP_NAME. If your app’s model resource has an alias of INFERENCE, which is the default, your three new config variables are:

INFERENCE_KEY
INFERENCE_MODEL_ID
INFERENCE_URL

To save these config variables as environment variables in your current environment, you can run:

export INFERENCE_KEY=$(heroku config:get INFERENCE_KEY -a $APP_NAME)
export INFERENCE_MODEL_ID=$(heroku config:get INFERENCE_MODEL_ID -a $APP_NAME)
export INFERENCE_URL=$(heroku config:get INFERENCE_URL -a $APP_NAME)

Or you can view and export your config vars all at once with:

eval $(heroku config -a $APP_NAME --shell | grep '^INFERENCE_' | tee /dev/tty | sed 's/^/export /')

In subsequent commands, you can specify your app’s <MODEL_RESOURCE> by either the --as alias, which is "INFERENCE" by default. Or you can specify it by the model resource slug, for example,inference-production-curved-41276. Run heroku ai:models:info -a $APP_NAME to view the slug and alias of an attached model resource.

Language-Specific Examples

We have language-specific quick start guides in Python, Ruby, and JavaScript for each of our endpoints.

Call an AI Model Resource

Via API / curl Requests

A typical model call looks like this:

curl $INFERENCE_URL/v1/chat/completions \
 -H "Authorization: Bearer $INFERENCE_KEY" \
 -d '{
"model": '"\"$INFERENCE_MODEL_ID\""',
 <other model keyword-arguments, varies model to model>
}'

Though the full endpoint URL varies depending on the model you’re using. For example:

Chat (claude-3-5-sonnet-latest, claude-3-5-haiku, and claude-3-haiku) models use the /v1/chat/completions endpoint.
Embedding model cohere-embed-multilingual uses the /v1/embeddings endpoint.
Diffusion model stable-image-ultra uses the v1/images/generations endpoint.

See our model cards for details on each model.

We recommend using streaming for all inferencing requests to prevent timeouts when requests exceed 29 seconds. Important Caveat: When using streaming with tool calling, the add-on streams complete responses after each tool call, rather than incremental updates. If any individual tool call takes 55 seconds or longer, a timeout will occur.

Via Heroku AI Plugin

A typical model call looks like this:

heroku ai:models:call <MODEL_RESOURCE> -a $APP_NAME --prompt 'What is 1+2?'

See our model cards for details on each model.

Monitoring and Logging

Display stats and the current state of your model resources via the ai plugin:

heroku ai:models:info <MODEL_RESOURCE> -a $APP_NAME # model resource can be the resource ID or alias.

Deprovisioning an AI Model Resource

If you only ever use heroku ai:models:create commands to create and attach model resources, you can use heroku ai:models:destroy to destroy that resource.

However, in some cases users attach a single model resource to multiple apps via heroku ai:models:attach. To destroy a model resource connected to multiple apps, you must first detach the resource from all but one app with heroku ai:models:detach. Then, run heroku ai:models:destroy or run the destroy command with the --force flag.

Destroy an AI Model Resource

This action destroys all associated data and you can’t undo it!

To destroy an AI model resource, run:

heroku ai:models:destroy <MODEL_RESOURCE> --app $APP_NAME

When destroying a model resource, you can specify the model resource’s alias or the resource ID.

Detach an AI Model Resource

If you chose to create and then attach model resources to certain apps, you can detach an AI model resource from a specific app with:

heroku ai:models:detach <MODEL_RESOURCE> --app $APP_NAME

When detaching a model resource, you can specify the model resource’s alias or the resource ID.

This add-on bills by usage. A detached AI model resource never bills you, nor will an attached AI model resource that you’re not actively using.

Keep reading

Inference Essentials

Categories

Managed Inference and Agents Add-on

Table of Contents

Tools

Benefits

Available Models

Region: `us`

Region: `eu`

Install the CLI Plugin

Provision Access to an AI Model Resource

Model Resource Config Vars

Language-Specific Examples

Call an AI Model Resource

Via API / curl Requests

Via Heroku AI Plugin

Monitoring and Logging

Deprovisioning an AI Model Resource

Destroy an AI Model Resource

Detach an AI Model Resource

Keep reading

Feedback

Categories

Managed Inference and Agents Add-on

Table of Contents

Tools

Benefits

Available Models

Region: us

Region: eu

Install the CLI Plugin

Provision Access to an AI Model Resource

Model Resource Config Vars

Language-Specific Examples

Call an AI Model Resource

Via API / curl Requests

Via Heroku AI Plugin

Monitoring and Logging

Deprovisioning an AI Model Resource

Destroy an AI Model Resource

Detach an AI Model Resource

Keep reading

Feedback

Region: `us`

Region: `eu`