AI Models

The Heroku Managed Inference and Agent add-on supports the following models. The add-on is hosted in two regions: us and eu. However, the add-on can be provisioned and accessed from apps in any Heroku region. Select a model to view information on rate limits, prompt caching, and implementation.

Model Documentation	Model ID	Region	Supported Inputs	Supported Outputs	API Endpoint	Model Source	Description
Amazon Rerank 1.0	amazon-rerank-1-0	US, EU	`text`	`score`	/v1/rerank	Amazon	A reliable, high-performing reranking model backed by AWS infrastructure.
Nova Lite	nova-lite	US, EU	`text`, `image`, `video`	`text`	/v1/chat/completions	Amazon	A fast and cost-effective LLM.
Nova 2 Lite	nova-2-lite	US, EU	`text`, `image`, `video`	`text`	/v1/chat/completions	Amazon	A fast and cost-effective LLM that supports conversational chat, tool-calling, and advanced reasoning with extended context.
Nova Pro	nova-pro	US, EU	`text`, `image`, `video`	`text`	/v1/chat/completions	Amazon	A high-performance LLM designed for complex tasks.
Claude 3 Haiku	claude-3-haiku	EU	`text`, `image`	`text`	/v1/chat/completions	Anthropic	A fast and affordable LLM that supports chat and tool-calling.
Claude 3.5 Haiku	claude-3-5-haiku	US, EU	`text`, `image`	`text`	/v1/chat/completions	Anthropic	An affordable and straightforward LLM that supports chat and tool-calling.
Claude 3.5 Sonnet Latest	claude-3-5-sonnet-latest	US, EU	`text`, `image`	`text`	/v1/chat/completions	Anthropic	A fast and affordable LLM that supports chat and tool-calling.
Claude 3.7 Sonnet	claude-3-7-sonnet	US, EU	`text`, `image`	`text`	/v1/chat/completions	Anthropic	An intelligent and detail-oriented LLM that supports chat, tool-calling, and enhanced reasoning.
Claude 4 Sonnet	claude-4-sonnet	US, EU	`text`, `image`	`text`	/v1/chat/completions	Anthropic	An intelligent and detail-oriented LLM that supports chat, tool-calling, and enhanced reasoning.
Claude 4.5 Haiku	claude-4-5-haiku	US, EU	`text`, `image`	`text`	/v1/chat/completions	Anthropic	A state-of-the-art LLM that supports chat, tool-calling, and enhanced reasoning.
Claude 4.5 Sonnet	claude-4-5-sonnet	US, EU	`text`, `image`	`text`	/v1/chat/completions	Anthropic	A state-of-the-art LLM optimized for enterprise apps that supports chat, tool-calling, and enhanced reasoning.
Claude Sonnet 4.6	claude-sonnet-4-6	US, EU	`text`, `image`	`text`	/v1/chat/completions	Anthropic	A state-of-the-art LLM designed for complex tasks including data processing, sales forecasting, and content generation.
Claude Opus 4.5	claude-opus-4-5	US, EU	`text`, `image`	`text`	/v1/chat/completions	Anthropic	A next-generation, frontier LLM that supports chat, tool-calling, autonomous coding, effort control, and enhanced reasoning.
Claude Opus 4.6	claude-opus-4-6	US, EU	`text`, `image`	`text`	/v1/chat/completions	Anthropic	A next-generation, frontier LLM that supports chat, tool-calling, autonomous coding, effort control, and enhanced reasoning.
Cohere Embed Multilingual	cohere-embed-multilingual	US, EU	`text`	`embedding`	/v1/embeddings	Cohere	A state-of-the-art embedding model that supports multiple languages and can be helpful for developing RAG search.
Cohere Embed V4	cohere-embed-v4	US, EU	`text`	`embedding`	/v1/embeddings	Cohere	A state-of-the-art embedding model that supports over 100 languages and can be helpful for developing RAG search.
Cohere Rerank 3.5	cohere-rerank-3-5	US, EU	`text`	`score`	/v1/rerank	Cohere	A reranking model that offers enhanced reasoning, broad data compatibility, and multilingual support.
DeepSeek V3.2	deepseek-v3-2	US	`text`	`text`	/v1/chat/completions	DeepSeek	An open-weight LLM that supports conversational chat, tool-calling, and high-efficiency reasoning.
MiniMax M2	minimax-m2	US	`text`	`text`	/v1/chat/completions	MiniMax	An open-weight LLM that supports conversational chat, tool-calling, and programming tasks.
MiniMax M2.1	minimax-m2-1	US	`text`	`text`	/v1/chat/completions	MiniMax	An open-weight LLM that supports conversational chat, tool-calling, and long-horizon reasoning.
Kimi K2 Thinking	kimi-k2-thinking	US	`text`	`text`	/v1/chat/completions	Moonshot AI	An open-weight LLM that supports conversational chat, tool-calling, and chain-of-thought processing.
Kimi K2.5	kimi-k2-5	US	`text`	`text`	/v1/chat/completions	Moonshot AI	An open-weight LLM that supports conversational chat, tool-calling, and multimodal agentic workflows.
OpenAI gpt-oss-120b	gpt-oss-120b	US, EU	`text`	`text`	/v1/chat/completions	OpenAI	An open-weight LLM that supports chat and tool-calling.
Qwen3 235B	qwen3-235b	US	`text`	`text`	/v1/chat/completions	Qwen	An open-weight LLM that supports conversational chat, tool-calling, complex reasoning, and agentic coding.
Qwen3 Coder 480B	qwen3-coder-480b	US	`text`	`text`	/v1/chat/completions	Qwen	An open-weight LLM that supports conversational chat, tool-calling, and agentic coding.
Stable Image Ultra	stable-image-ultra	US	`text`	`image`	/v1/images/generations	Stability AI	A state-of-the-art diffusion (image generation) model.
GLM 4.7	glm-4-7	US	`text`	`text`	/v1/chat/completions	Z.ai	An open-weight LLM that supports conversational chat, tool-calling, and stable multi-step reasoning.
GLM 4.7 Flash	glm-4-7-flash	US	`text`	`text`	/v1/chat/completions	Z.ai	An open-weight LLM that supports conversational chat, tool-calling, and low-latency agentic tasks.

Deprecated Models

The following models are being deprecated and will reach end-of-life on the dates listed below. During the deprecation period, requests to these models return a warning header. Prior to the EOL date, model-specific plans for deprecated models will be converted to the standard plan. After the EOL date, requests to these models return HTTP 410.

Model	Model ID	Deprecation Date	EOL Date	Replacement
Claude 3.5 Sonnet Latest	claude-3-5-sonnet-latest	January 22, 2026	February 22, 2026	claude-4-6-sonnet
Claude 3.7 Sonnet	claude-3-7-sonnet	March 21, 2026	April 21, 2026	claude-4-6-sonnet
Claude 3.5 Haiku	claude-3-5-haiku	May 12, 2026	June 12, 2026	claude-4-5-haiku

Categories

AI Models

Deprecated Models