Managed Inference and Agent API /v1/chat/completions
Last updated January 29, 2025
This article is a work in progress, or documents a feature that is not yet released to all users. This article is unlisted. Only those with the link can access it.
Table of Contents
The Heroku Managed Inference and Agent add-on is currently in pilot. The products offered as part of the pilot aren’t intended for production use and are considered as a Beta Service and are subject to the Beta Services terms at https://www.salesforce.com/company/legal/agreements.jsp.
The /v1/chat/completions
endpoint generates conversational completions for a provided set of input messages. You can specify the model, adjust generation settings such as temperature
, and opt to stream the responses in real-time. You can also specify tools
the model can choose to call.
Request Body Parameters
Use parameters to manage how conversational completions are generated.
Required Parameters
Field | Type | Description | Example |
---|---|---|---|
model | string | model used for completion—typically you’ll use your INFERENCE_MODEL_ID config variable for this value |
“claude-3-5-sonnet” |
messages | array | an array of messages objects (user-assistant conversational turns) that will be used by the model to generate the next response | [{“role”: “user”, “content”: “Why is Heroku so awesome?”}] |
Optional Parameters
Field | Type | Description | Default | Example |
---|---|---|---|---|
max_tokens | integer | maximum tokens the model is allowed to generate before stopping (each token typically represents around 4 characters of text) max value: 4096 |
4096 | 100 |
stop | array | a list of strings where the model will stop generating further tokens once any of the strings is encountered in the response (for example, ["foo"] would cause the model to stop generating output after (and if) it generated the string "foo" ) |
null | [“foo”] |
stream | boolean | option to stream responses incrementally via server-sent events (useful for chat interfaces and getting around timeout errors) | false | true |
temperature | float | controls the randomness of the response—values closer to 0 make the response more focused by favoring high-probability tokens, while values closer to 1.0 encourage more diverse responses by sampling from a broader range of possibilities for each generated token range: 0.0 to 1.0 |
1.0 | 0.2 |
tool_choice | enum or object | option to force the model to use one or more of the tools listed in tools (see tool_choice) |
“required” | “auto” |
tools | array | the tools that the model may call (see tools) | [] | Refer to the JSON example in the tools section |
top_p | float | specifies the proportion of tokens to consider when generating the next token, in terms of cumulative probability range: 0 to 1.0 |
0.999 | 0.95 |
tools
Array of Objects
tools
lets you provide your model with an array of tools it may choose to call (use tool_choice
to specify how the model will call tools).
When provided, your model may send back tool_calls
in the role="assistant"
generated message, asking your system to run the specified tool, and send back the result in a role="tool"
message.
Note that these tools are given to the model in the form of an extended prompt and no further validation is done.
Models may make up tool names that don’t exist in the tools array you have given them. To avoid this, we recommend you perform tool validation on your end when a model sends back a tool_calls
assistant message.
Field | Type | Description | Example |
---|---|---|---|
type | string | the type of tool one of: “function” or “heroku_tool” |
“function” |
function | object | details about the function to be called (see tools function object below | see function field in example below) |
Heroku provides a variety of custom Heroku tools that are executed automatically by Heroku, rather than returning a tool call request to your system.
function
Object
Field | Type | Description | Example |
---|---|---|---|
description | string | description of what the function does, used by the model to choose when and how to call the function | “This function calculates X” |
name | string | name of the function to be called | “example_function” |
parameters | object | parameters the function accepts as a JSON Schema object | {“type”: “object”, “properties”: {}} |
Example tools
Array
[
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. Portland, OR"
}
},
"required": ["location"]
}
}
}
]
tool_choice
Object
The tool_choice
object specifies how the model should use the provided tools
.
It can either be a string (none
, auto
, or required
), or a tool_choice
object.
none
will mean the model will call no tools. auto
allows the model to call zero to many of the provided tools, and required
forces the model to call at least one or more tools before responding to the user.
To force the model to call a specific tool, you may simply specify a single tool in the tools
object and pass "tools": "required"
, or you can force the tool selection by passing a tool_choice
object that specifies the required function.
Field | Type | Description | Example |
---|---|---|---|
type | enum<string> | the type of the tool one of: function or heroku_tool |
“function” |
function | object | a JSON object containing the function’s name | {“name”: “my_cool_function”} |
messages
Array of Objects
The messages
object is an array of message objects.
Each message must specify a role
field that determines the messages’s schema (see below).
Currently, the supported types are user
, assistant
, system
, and tool
.
If the most recent message uses the assistant
role, the model will continue its answer starting from the content in that most recent message.
role=user
message
user
messages are the primary way to send queries to your model and prompt it to respond.
Field | Type | Description | Required | Example |
---|---|---|---|---|
role | string | role of the message (user ) |
yes | “user” |
content | string or array | contents of the user message | yes | “What is the weather?” |
role=assistant
message
Typically, assistant
messages are only generated by the model, however you can create your own or pre-fill a partially completed assistant
response to help influence the content that the model will generate on its next turn.
Field | Type | Description | Required | Example |
---|---|---|---|---|
role | string | role of the message (assistant ) |
yes | “assistant” |
content | string or array | contents of the assistant message | yes, unless tool_calls is specified |
“Here is the information” |
refusal | string or null | refusal message by the assistant | no | “I cannot answer that” |
tool_calls | array | tool calls generated by the model | no | [{“id”: “tool_call_12345”, “type”: “function”, “function”: {“name”: “my_cool_tool”, “arguments”: {“some_input”: 123}}}] |
role=system
message
A system
message is sort of a prompt ‘prefix’ that is given to the model to help influence its responses.
Field | Type | Description | Required | Example |
---|---|---|---|---|
role | string | role of the message (system ) |
yes | “system” |
content | string or array | contents of the system message | yes | “You are a helpful assistant. You favor brevity and avoid hedging. You readily admit when you don’t know an answer.” |
role=tool
message
A tool
message object lets you communicate a specified tool’s result (output) to the model.
Field | Type | Description | Required | Example |
---|---|---|---|---|
role | string | role of the message (tool ) |
yes | “get_weather” |
content | string or array | contents of the tool message | yes | “Rainy and 84º” |
tool_call_id | string | tool call that this message is responding to | yes | “toolu_02F9GXvY5MZAq8Lw3PTNQyJK” |
Example tool_calls
Object
Here’s an example of what a tool_calls
object might look like, after your model has decided to call a tool you’ve given to it as an option via tools.
[
{
"role": "assistant",
"tool_calls": [
{
"id": "toolu_02F9GXvY5MZAq8Lw3PTNQyJK",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"location\":\"Portland, OR\"}"
}
}
],
}
]
Request Headers
In the following example, we assume your model resource has an alias of “INFERENCE
” (the default).
Header | Type | Description |
---|---|---|
Authorization |
string | your AI add-on’s ‘INFERENCE_KEY’ value (API bearer token) |
All inference curl requests must include an Authorization
header containing your Heroku Inference key.
For example, all /v1/chat/completions
requests should follow this pattern:
# If you're developing locally, run this to set your config vars as ENV variables.
eval $(heroku config -a $APP_NAME --shell | grep '^INFERENCE_' | sed 's/^/export /' | tee >(cat >&2))
curl $INFERENCE_URL/v1/chat/completions \
-H "Authorization: Bearer $INFERENCE_KEY" \
-d @- <<EOF
{
"model": "$INFERENCE_MODEL_ID",
"messages": [{"role": "user", "content": "Hello"}]
}
EOF