Table of Contents [expand]
Heroku’s Managed Inference and Agents Add-on API offers broad OpenAI compatibility that works for most use cases. You can use familiar OpenAI SDK patterns while gaining access to Heroku’s infrastructure, security, and specialized tools.
Get Started With the OpenAI SDK
Prerequisites
To use the OpenAI SDK with Heroku, you need:
We recommend always using the latest version of the OpenAI SDK.
Set Up a Heroku App to Use the OpenAI SDK
When you attach a model resource to your app using heroku ai:models:create
, the add-on automatically adds some config vars to your app’s environment.
Include your add-on’s config vars in your code:
- Set the base URL to the
INFERENCE_URL
for your add-on. - Set the API key to the
INFERENCE_KEY
for your add-on. - Set the model in the request to the
INFERENCE_MODEL_ID
.
Example Setup for Python OpenAI SDK
from openai import OpenAI
import os
api_key=os.getenv("INFERENCE_KEY")
api_url=os.getenv("INFERENCE_URL")
model=os.getenv("INFERENCE_MODEL_ID")
client = OpenAI(
api_key=api_key, # Your Managed Inference API key
base_url=api_url + "/v1/" # Managed Inference API endpoint, for example, https://us.inference.heroku.com
)
response = client.chat.completions.create(
model=model, # Add-on plan name
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What should I build today?"}
],
)
print(response.choices[0].message.content)
Request Parameter Support
Request Fields
Request Field | Supported? |
---|---|
model | Yes, use add-on plan name |
messages | Yes |
modalities | Yes, for type text |
logprobs | Yes |
max_completion_tokens | Yes |
parallel_tool_calls | Yes for Anthropic and OpenAI models, ignored for nova models |
reasoning_effort | Yes |
stop | Yes |
stream | Yes |
stream_options | Yes |
temperature | Yes |
tool_choice | Yes* |
tool_options | Yes |
tools | Yes |
top_p | Yes |
n | Yes, but ignored |
frequency_penalty | Yes, but ignored |
logprobs | Yes, but ignored |
audio | Yes, but ignored |
logit_bias | Yes, but ignored |
store | Yes, but ignored |
metadata | Yes, but ignored |
prediction | Yes, but ignored |
presence_quality | Yes, but ignored |
prompt_cache_key | Yes, but ignored |
response_format | Yes, but ignored |
safety_identifier | Yes, but ignored |
service_tier | Yes, but ignored |
store | Yes, but ignored |
top_logprobs | Yes, but ignored |
verbosity | Yes, but ignored |
web_search_options | Yes, but ignored |
Request Messages
Message Type | Supported? |
---|---|
Developer | Yes |
System | Yes |
User | Yes, except for audio and file content types |
Assistant | Yes, except for audio content type |
Tool | Yes |
Tool Choice
The tool_choice
parameter is supported, except for custom tool choice.
To learn more about using tool_choice
, see the OpenAI docs.
Tools
Function type tools are supported. Custom tools are unsupported.
To learn more about using tools, see the OpenAI docs.
Response Parameter Support
Response Fields
Response Field | Supported? |
---|---|
choices | Yes |
created | Yes |
id | Yes |
model | Yes |
object | Yes |
service_tier | No |
system_fingerprint | Yes |
usage | Yes |
Choice Fields
Choice Field | Supported? |
---|---|
index | Yes |
message | Yes, except for annotations and audio |
logprobs | Yes |
finish_reason | Yes |
Usage Fields
Usage Field | Supported? |
---|---|
prompt_tokens | Yes |
completion_tokens | Yes, except for annotations and audio |
total_tokens | Yes |
Other usage field types are unsupported.
Limitations
Extended Thinking
Claude 3.7 Sonnet and Claude 4 Sonnet support extended thinking. You can use the extensive extended_reasoning
parameter we expose, or OpenAI’s reasoning_effort
parameter (low, medium, high mapping to fixed reasoning budget tokens).
If you use the OpenAI client, you don’t see reasoning blocks in the response. Set include_reasoning
to false.
Python Extended Thinking Example
response = client.chat.completions.create(
model=model, # Must be Claude Sonnet 3.7 or 4
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who are you?"}
],
extra_body={
"extended_thinking": {
"enabled": True,
"budget_tokens": 2000,
"include_reasoning": False
}
}
)
Allowing Ignored Parameters
The Managed Inference and Agents Add-on API returns an error for unrecognized parameters. To disable this error, set allow_ignored_params
to true. Ignoring parameters is useful if you’re using an older version of the SDK with parameters that aren’t fully supported by Heroku.
Python Allowing Ignored Parameters Example
response = client.chat.completions.create(
model=model, # Must be Claude Sonnet 3.7 or 4
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who are you?"}
],
extra_body={
"allow_ignored_params": True
}
)
Limitations for Tool Calling
The content
field must be populated to use tool calling. If content
is empty, the API throws a validation error. To prevent empty tool responses, you can either ensure the tool always has a response or populate empty content
fields with default values in your application code.