Managed Inference and Agent API /v1/embeddings
Last updated January 15, 2025
This article is a work in progress, or documents a feature that is not yet released to all users. This article is unlisted. Only those with the link can access it.
Table of Contents
The Heroku Managed Inference and Agent add-on is currently in pilot. The products offered as part of the pilot aren’t intended for production use and are considered as a Beta Service and are subject to the Beta Services terms at https://www.salesforce.com/company/legal/agreements.jsp.
The /v1/embeddings
endpoint generates vector embeddings (basically, a list of numbers) for a provided set of input texts. These embeddings are optimized for various use cases, such as search, classification, and clustering. You can customize how inputs are processed and choose different embedding types to suit your needs.
Request Body Parameters
Required Parameters
Field | Type | Description | Example |
---|---|---|---|
model | string | ID of the embedding model to use | “cohere-embed-multilingual” |
input | array | an array of (up to 96) strings for the model to embed (recommended length is less than 512 tokens per string) | [“example string 1”, “example string 2”] |
Optional Parameters
Field | Type | Description | Default | Example |
---|---|---|---|---|
input_type | enum<string> | specifies the type of input passed to the model (prepends special tokens to the input) one of: search_document , search_query , classification , clustering |
“search_document” | “search_query” |
encoding_format | enum<string> | determines the encoding format of the output one of: raw or base64 |
“raw” | “base64” |
embedding_type | enum<string> | specifies the type(s) of embeddings to return (float , int8 , uint8 , binary , ubinary ) |
“float” | “int8” |
Request Headers
In the following example, we assume your model resource has an alias of "EMBEDDING"
(meaning you created the model resource with an --as EMBEDDING
flag).
Header | Type | Description |
---|---|---|
Authorization |
string | your AI add-on’s ‘EMBEDDING_KEY’ value (API bearer token) |
Inference curl requests must include an Authorization
header containing your Heroku Inference key for the specified model.
For example, all /v1/embeddings
curl requests should follow this pattern:
# If you're developing locally, run this to set your config vars as ENV variables.
eval $(heroku config -a $APP_NAME --shell | grep '^EMBEDDING_' | sed 's/^/export /' | tee >(cat >&2))
curl $EMBEDDING_URL/v1/embeddings \
-H "Authorization: Bearer $EMBEDDING_KEY" \
-d @- <<EOF
{
"model": "$EMBEDDING_MODEL_ID",
"input": "Hello, I am a long string (document) and I want to be turned into a searchable embedding vector! What fun!"
}
EOF