heroku-inference
Last updated February 28, 2025
This article is a work in progress, or documents a feature that is not yet released to all users. This article is unlisted. Only those with the link can access it.
The heroku-inference add-on is currently in beta.
Table of Contents
The Heroku Managed Inference and Agent add-on is currently in pilot. The products offered as part of the pilot aren’t intended for production use and are considered as a Beta Service and are subject to the Beta Services terms at https://www.salesforce.com/company/legal/agreements.jsp.
The Heroku Managed Inference and Agent add-on offers an easy way to access various large foundational AI models, including supported language (chat), embedding, and diffusion (image) models.
To use these models, attach one or more model resources from the Heroku Managed Inference and Agent add-on to your Heroku app. The add-on adds config variables to your app, allowing you to call the provisioned models. You can call models with the Heroku AI CLI plug-in or via direct curl requests.
All available models are hosted on Amazon Bedrock. Heroku provides an OpenAI-compatible API to access the models.
Check out the Python, Ruby, and JavaScript (Node.js) quick start guides.
Tools
This add-on can also enable your Large Language Models (LLMs) to run tools on Heroku automatically, with built-in support for retries and error correction. It supports both custom tools you create and built-in tools, like code execution, provided by Heroku.
When enabled, your app’s LLM calls a tool that triggers Heroku’s control loop to provision, execute, and deprovision dynos in the background, with action traces included in the model’s output.
See Heroku Tools to learn more.
Benefits
Heroku owns and maintains the add-on. Your data (excluding data sent to certain externally-run Heroku Tools) is never sent outside of secure AWS accounts. Inference prompts and completions are only logged temporarily via Heroku Logplex, which you control.
Available Models
The following models are available.
Region: us
Model Documentation | Type | API Endpoint | Model Source | Description |
---|---|---|---|---|
claude-3-5-sonnet-latest | text → text |
v1/chat/completions | Anthropic | A state-of-the-art large language model that supports chat and tool-calling. |
claude-3-5-haiku | text → text |
v1/chat/completions | Anthropic | A faster, more affordable large language model that supports chat and tool-calling. |
cohere-embed-multilingual | text → embedding |
v1/embeddings | Cohere | A state-of-the-art embedding model that supports multiple languages. This model is helpful for developing Retrieval Augmented Generation (RAG) search. |
stable-image-ultra | text → image |
v1/images/generations | Stability AI | A state-of-the-art diffusion (image generation) model. |
Region: eu
Model Documentation | Type | API Endpoint | Model Source | Description |
---|---|---|---|---|
claude-3-5-sonnet | text → text |
v1/chat/completions | Anthropic | A state-of-the-art large language model that supports chat and tool-calling. |
claude-3-haiku | text → text |
v1/chat/completions | Anthropic | A faster, more affordable large language model that supports chat and tool-calling. |
cohere-embed-multilingual | text → embedding |
v1/embeddings | Cohere | A state-of-the-art embedding model that supports multiple languages. This model is helpful for developing Retrieval Augmented Generation (RAG) search. |
Install the CLI Plugin
Heroku provides an AI CLI plugin to interact with your model resources.
Install the Heroku CLI if you haven’t installed it yet. Then, install the Heroku AI plugin:
heroku plugins:install @heroku/plugin-ai
See Heroku AI CLI Plugin Command Reference for details of all plugin commands.
Provision Access to an AI Model Resource
To use a model, you must first create and attach a model resource $MODEL_ID
to your app $APP_NAME
.
If you don’t have an app, you can create one with heroku create <your-new-app-name>
.
To view the available models, you can run heroku ai:models:list
. After deciding which model you want to use, run:
heroku ai:models:create -a $APP_NAME $MODEL_ID
You can attach multiple model resources to a single app. By default, the first model resource you attach to an app has an alias of INFERENCE
. Subsequent attachments have randomized alias names, so we recommend you specify an alias with the --as
flag. Specifically, we recommend using --as
values of EMBEDDING
and DIFFUSION
for our embedding and diffusion models:
heroku ai:models:create -a $APP_NAME cohere-embed-multilingual --as EMBEDDING
heroku ai:models:create -a $APP_NAME stable-image-ultra --as DIFFUSION
We recommend using an alias of INFERENCE
for the chat models, an alias of EMBEDDING
for the embedding model (cohere-embed-multilingual
), and DIFFUSION
for the image model (stable-image-ultra
) that we offer. Our example code follows this pattern, so for easy copy and pasting of commands, we recommend you also use these aliases.
If you attach more than one model resource of the same type to a single app, you must specify your own alias and replace any example code you’re using with the resulting config vars.
Model Resource Config Vars
After attaching a model resource to your app, your app has three new config variables. You can view these variables by calling heroku config -a $APP_NAME
. If your app’s model resource has an alias of INFERENCE
, which is the default, your three new config variables are:
INFERENCE_KEY
INFERENCE_MODEL_ID
INFERENCE_URL
To save these config variables as environment variables in your current environment, you can run:
export INFERENCE_KEY=$(heroku config:get INFERENCE_KEY -a $APP_NAME)
export INFERENCE_MODEL_ID=$(heroku config:get INFERENCE_MODEL_ID -a $APP_NAME)
export INFERENCE_URL=$(heroku config:get INFERENCE_URL -a $APP_NAME)
Or you can view and export your config vars all at once with:
eval $(heroku config -a $APP_NAME --shell | grep '^INFERENCE_' | tee /dev/tty | sed 's/^/export /')
In subsequent commands, you can specify your app’s <MODEL_RESOURCE>
by either the --as
alias, which is "INFERENCE"
by default. Or you can specify it by the model resource slug, for example,inference-production-curved-41276
. Run heroku ai:models:info -a $APP_NAME
to view the slug and alias of an attached model resource.
Language-Specific Examples
We have language-specific quick start guides in Python, Ruby, and JavaScript for each of our endpoints.
Call an AI Model Resource
Via API / curl Requests
A typical model call looks like this:
curl $INFERENCE_URL/v1/chat/completions \
-H "Authorization: Bearer $INFERENCE_KEY" \
-d '{
"model": '"\"$INFERENCE_MODEL_ID\""',
<other model keyword-arguments, varies model to model>
}'
Though the full endpoint URL varies depending on the model you’re using. For example:
- Chat (
claude-3-5-sonnet-latest
,claude-3-5-sonnet
,claude-3-5-haiku
, andclaude-3-haiku
) models use the/v1/chat/completions
endpoint. - Embedding model
cohere-embed-multilingual
uses the/v1/embeddings
endpoint. - Diffusion model
stable-image-ultra
uses thev1/images/generations
endpoint.
See our model cards for details on each model.
We recommend using streaming for all inferencing requests to prevent timeouts when requests exceed 29 seconds. Important Caveat: When using streaming with tool calling, the Managed Inference add-on streams complete responses after each tool call, rather than incremental updates. If any individual tool call takes 55 seconds or longer, a timeout will occur.
Via Heroku AI Plugin
A typical model call looks like this:
heroku ai:models:call <MODEL_RESOURCE> -a $APP_NAME --prompt 'What is 1+2?'
See our model cards for details on each model.
Monitoring and Logging
Display stats and the current state of your model resources via the ai
plugin:
heroku ai:models:info <MODEL_RESOURCE> -a $APP_NAME # model resource can be the resource ID or alias.
Deprovisioning an AI Model Resource
If you only ever use heroku ai:models:create
commands to create and attach model resources, you can use heroku ai:models:destroy
to destroy that resource.
However, in some cases users attach a single model resource to multiple apps via heroku ai:models:attach
. To destroy a model resource connected to multiple apps, you must first detach the resource from all but one app with heroku ai:models:detach
. Then, run heroku ai:models:destroy
or run the destroy command with the --force
flag.
Destroy an AI Model Resource
This action destroys all associated data and you can’t undo it!
To destroy an AI model resource, run:
heroku ai:models:destroy <MODEL_RESOURCE> --app $APP_NAME
When destroying a model resource, you can specify the model resource’s alias or the resource ID.
Detach an AI Model Resource
If you chose to create
and then attach
model resources to certain apps, you can detach an AI model resource from a specific app with:
heroku ai:models:detach <MODEL_RESOURCE> --app $APP_NAME
When detaching a model resource, you can specify the model resource’s alias or the resource ID.
This add-on bills by usage. A detached AI model resource never bills you, nor will an attached AI model resource that you’re not actively using.