Deep-dive on the Next Gen Platform. Join the Webinar!

Skip Navigation
Show nav
Dev Center
  • Get Started
  • Documentation
  • Changelog
  • Search
  • Get Started
    • Node.js
    • Ruby on Rails
    • Ruby
    • Python
    • Java
    • PHP
    • Go
    • Scala
    • Clojure
    • .NET
  • Documentation
  • Changelog
  • More
    Additional Resources
    • Home
    • Elements
    • Products
    • Pricing
    • Careers
    • Help
    • Status
    • Events
    • Podcasts
    • Compliance Center
    Heroku Blog

    Heroku Blog

    Find out what's new with Heroku on our blog.

    Visit Blog
  • Log inorSign up
Hide categories

Categories

  • Heroku Architecture
    • Compute (Dynos)
      • Dyno Management
      • Dyno Concepts
      • Dyno Behavior
      • Dyno Reference
      • Dyno Troubleshooting
    • Stacks (operating system images)
    • Networking & DNS
    • Platform Policies
    • Platform Principles
  • Developer Tools
    • Command Line
    • Heroku VS Code Extension
  • Deployment
    • Deploying with Git
    • Deploying with Docker
    • Deployment Integrations
  • Continuous Delivery & Integration (Heroku Flow)
    • Continuous Integration
  • Language Support
    • Node.js
      • Working with Node.js
      • Troubleshooting Node.js Apps
      • Node.js Behavior in Heroku
    • Ruby
      • Rails Support
      • Working with Bundler
      • Working with Ruby
      • Ruby Behavior in Heroku
      • Troubleshooting Ruby Apps
    • Python
      • Working with Python
      • Background Jobs in Python
      • Python Behavior in Heroku
      • Working with Django
    • Java
      • Java Behavior in Heroku
      • Working with Java
      • Working with Maven
      • Working with Spring Boot
      • Troubleshooting Java Apps
    • PHP
      • PHP Behavior in Heroku
      • Working with PHP
    • Go
      • Go Dependency Management
    • Scala
    • Clojure
    • .NET
      • Working with .NET
  • Databases & Data Management
    • Heroku Postgres
      • Postgres Basics
      • Postgres Getting Started
      • Postgres Performance
      • Postgres Data Transfer & Preservation
      • Postgres Availability
      • Postgres Special Topics
      • Migrating to Heroku Postgres
    • Heroku Key-Value Store
    • Apache Kafka on Heroku
    • Other Data Stores
  • AI
    • Working with AI
  • Monitoring & Metrics
    • Logging
  • App Performance
  • Add-ons
    • All Add-ons
  • Collaboration
  • Security
    • App Security
    • Identities & Authentication
      • Single Sign-on (SSO)
    • Private Spaces
      • Infrastructure Networking
    • Compliance
  • Heroku Enterprise
    • Enterprise Accounts
    • Enterprise Teams
    • Heroku Connect (Salesforce sync)
      • Heroku Connect Administration
      • Heroku Connect Reference
      • Heroku Connect Troubleshooting
  • Patterns & Best Practices
  • Extending Heroku
    • Platform API
    • App Webhooks
    • Heroku Labs
    • Building Add-ons
      • Add-on Development Tasks
      • Add-on APIs
      • Add-on Guidelines & Requirements
    • Building CLI Plugins
    • Developing Buildpacks
    • Dev Center
  • Accounts & Billing
  • Troubleshooting & Support
  • Integrating with Salesforce
  • Add-ons
  • All Add-ons
  • Managed Inference and Agent API /v1/chat/completions

Managed Inference and Agent API /v1/chat/completions

Last updated April 07, 2025

This article is a work in progress, or documents a feature that is not yet released to all users. This article is unlisted. Only those with the link can access it.

Table of Contents

  • Request Body Parameters
  • tools Array of Objects
  • tool_choice Object
  • messages Array of Objects
  • Request Headers

The Heroku Managed Inference and Agent add-on is currently in pilot. The products offered as part of the pilot aren’t intended for production use and are considered as a Beta Service and are subject to the Beta Services terms at https://www.salesforce.com/company/legal/agreements.jsp.

The /v1/chat/completions endpoint generates conversational completions for a provided set of input messages. You can specify the model, adjust generation settings such as temperature , and opt to stream the responses in real-time. You can also specify tools the model can choose to call.

Request Body Parameters

Use parameters to manage how conversational completions are generated.

Required Parameters

Field Type Description Example
model string model used for completion—typically you’ll use your INFERENCE_MODEL_ID config variable for this value “claude-3-5-sonnet”
messages array an array of messages objects (user-assistant conversational turns) that will be used by the model to generate the next response [{“role”: “user”, “content”: “Why is Heroku so awesome?”}]

Optional Parameters

Field Type Description Default Example
max_tokens integer maximum tokens the model is allowed to generate before stopping (each token typically represents around 4 characters of text)
max value: 4096
4096 100
stop array a list of strings where the model will stop generating further tokens once any of the strings is encountered in the response (for example, ["foo"] would cause the model to stop generating output after (and if) it generated the string "foo") null [“foo”]
stream boolean option to stream responses incrementally via server-sent events (useful for chat interfaces and getting around timeout errors) false true
temperature float controls the randomness of the response—values closer to 0 make the response more focused by favoring high-probability tokens, while values closer to 1.0 encourage more diverse responses by sampling from a broader range of possibilities for each generated token
range: 0.0 to 1.0
1.0 0.2
tool_choice enum or object option to force the model to use one or more of the tools listed in tools (see tool_choice) “required” “auto”
tools array the tools that the model may call (see tools) [] Refer to the JSON example in the tools section
top_p float specifies the proportion of tokens to consider when generating the next token, in terms of cumulative probability
range: 0 to 1.0
0.999 0.95

tools Array of Objects

tools lets you provide your model with an array of tools it may choose to call (use tool_choice to specify how the model will call tools). When provided, your model may send back tool_calls in the role="assistant" generated message, asking your system to run the specified tool, and send back the result in a role="tool" message.

Note that these tools are given to the model in the form of an extended prompt and no further validation is done. Models may make up tool names that don’t exist in the tools array you have given them. To avoid this, we recommend you perform tool validation on your end when a model sends back a tool_calls assistant message.

Field Type Description Example
type string the type of tool
one of: “function” or “heroku_tool”
“function”
function object details about the function to be called (see tools function object below see function field in example below)

Heroku provides a variety of custom Heroku tools that are executed automatically by Heroku, rather than returning a tool call request to your system.

function Object

Field Type Description Example
description string description of what the function does, used by the model to choose when and how to call the function “This function calculates X”
name string name of the function to be called “example_function”
parameters object parameters the function accepts as a JSON Schema object {“type”: “object”, “properties”: {}}

Example tools Array

[
    {
      "type": "function",
      "function": {
        "name": "get_current_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The city and state, e.g. Portland, OR"
            }
          },
          "required": ["location"]
        }
      }
    }
  ]

tool_choice Object

The tool_choice object specifies how the model should use the provided tools.

It can either be a string (none, auto, or required), or a tool_choice object. none will mean the model will call no tools. auto allows the model to call zero to many of the provided tools, and required forces the model to call at least one or more tools before responding to the user.

To force the model to call a specific tool, you may simply specify a single tool in the tools object and pass "tools": "required", or you can force the tool selection by passing a tool_choice object that specifies the required function.

Field Type Description Example
type enum<string> the type of the tool
one of: function or heroku_tool
“function”
function object a JSON object containing the function’s name {“name”: “my_cool_function”}

messages Array of Objects

The messages object is an array of message objects.

Each message must specify a role field that determines the messages’s schema (see below).

Currently, the supported types are user, assistant, system, and tool.

If the most recent message uses the assistant role, the model will continue its answer starting from the content in that most recent message.

role=user message

user messages are the primary way to send queries to your model and prompt it to respond.

Field Type Description Required Example
role string role of the message (user) yes “user”
content string contents of the user message yes “What is the weather?”

role=assistant message

Typically, assistant messages are only generated by the model, however you can create your own or pre-fill a partially completed assistant response to help influence the content that the model will generate on its next turn.

Field Type Description Required Example
role string role of the message (assistant) yes “assistant”
content string or array contents of the assistant message yes, unless tool_calls is specified “Here is the information”
refusal string or null refusal message by the assistant no “I cannot answer that”
tool_calls array tool calls generated by the model no [{“id”: “tool_call_12345”, “type”: “function”, “function”: {“name”: “my_cool_tool”, “arguments”: {“some_input”: 123}}}]

role=system message

A system message is sort of a prompt ‘prefix’ that is given to the model to help influence its responses.

Field Type Description Required Example
role string role of the message (system) yes “system”
content string or array contents of the system message yes “You are a helpful assistant. You favor brevity and avoid hedging. You readily admit when you don’t know an answer.”

role=tool message

A tool message object lets you communicate a specified tool’s result (output) to the model.

Field Type Description Required Example
role string role of the message (tool) yes “get_weather”
content string or array contents of the tool message yes “Rainy and 84º”
tool_call_id string tool call that this message is responding to yes “toolu_02F9GXvY5MZAq8Lw3PTNQyJK”

Example tool_calls Object

Here’s an example of what a tool_calls object might look like, after your model has decided to call a tool you’ve given to it as an option via tools.

[
  {
    "role": "assistant",
    "tool_calls": [
      {
        "id": "toolu_02F9GXvY5MZAq8Lw3PTNQyJK",
        "type": "function",
        "function": {
            "name": "get_weather",
            "arguments": "{\"location\":\"Portland, OR\"}"
        }
      }
    ],
  }
]

Request Headers

In the following example, we assume your model resource has an alias of “INFERENCE” (the default).

Header Type Description
Authorization string your AI add-on’s ‘INFERENCE_KEY’ value (API bearer token)

All inference curl requests must include an Authorization header containing your Heroku Inference key.

For example, all /v1/chat/completions requests should follow this pattern:

# If you're developing locally, run this to set your config vars as ENV variables.
eval $(heroku config -a $APP_NAME --shell | grep '^INFERENCE_' | sed 's/^/export /' | tee >(cat >&2))

curl $INFERENCE_URL/v1/chat/completions \
 -H "Authorization: Bearer $INFERENCE_KEY" \
 -d @- <<EOF
{
  "model": "$INFERENCE_MODEL_ID",
  "messages": [{"role": "user", "content": "Hello"}]
}
EOF

Keep reading

  • All Add-ons

Feedback

Log in to submit feedback.

Zara 4 Memcached Cloud

Information & Support

  • Getting Started
  • Documentation
  • Changelog
  • Compliance Center
  • Training & Education
  • Blog
  • Support Channels
  • Status

Language Reference

  • Node.js
  • Ruby
  • Java
  • PHP
  • Python
  • Go
  • Scala
  • Clojure
  • .NET

Other Resources

  • Careers
  • Elements
  • Products
  • Pricing
  • RSS
    • Dev Center Articles
    • Dev Center Changelog
    • Heroku Blog
    • Heroku News Blog
    • Heroku Engineering Blog
  • Twitter
    • Dev Center Articles
    • Dev Center Changelog
    • Heroku
    • Heroku Status
  • Github
  • LinkedIn
  • © 2025 Salesforce, Inc. All rights reserved. Various trademarks held by their respective owners. Salesforce Tower, 415 Mission Street, 3rd Floor, San Francisco, CA 94105, United States
  • heroku.com
  • Legal
  • Terms of Service
  • Privacy Information
  • Responsible Disclosure
  • Trust
  • Contact
  • Cookie Preferences
  • Your Privacy Choices