Table of Contents [expand]
Last updated January 15, 2026
The /v1/rerank endpoint ranks documents on their semantic relevance to a query. You can use this endpoint to improve response quality in retrieval-augmented generation (RAG) systems, semantic search, and question-answering applications.
View our available rerank models.
Request Body Parameters
Use parameters to control how documents are ranked.
Required Parameters
| Field | Type | Description | Example |
|---|---|---|---|
| model | string | ID of the rerank model to use | "cohere-rerank-3-5" |
| query | string | search query or question used to rank documents | "How do you create a Heroku App?" |
| documents | array of strings | list of text documents to rank max strings in array: 1000 documents |
["doc1", "doc2", "doc3"] |
Optional Parameters
| Field | Type | Description | Default | Example |
|---|---|---|---|---|
| top_n | integer | number of top-ranked results to return | all documents | 10 |
Request Headers
In the following example, we assume your model resource has an alias of “RERANK” (meaning you created the model resource with an --as RERANK flag).
| Header | Type | Description |
|---|---|---|
Authorization |
string | your AI add-on’s ‘RERANK’ value (API bearer token) |
All inference curl requests must include an Authorization header containing your Heroku Inference key.
Response Format
When a request is successful, the API returns a JSON object with the following structure:
| Field | Type | Description |
|---|---|---|
| id | string | unique identifier for this response (UUID format) |
| results | array of objects | ranked documents, ordered by relevance (highest first) |
| meta | object | response metadata including API version and billing information |
Results Object
Each object inside the results array includes:
| Field | Type | Description |
|---|---|---|
| index | integer | original position of the document in the input array (0-indexed) |
| relevance_score | float | semantic relevance score (higher value = more relevant to query) |
Meta Object
The meta object includes:
| Field | Type | Description |
|---|---|---|
| api_version | object | API version information always: 2 |
| billed_units | object | billing information for request |
| billed_units.search_units | integer | number of search units consumed by request |
Error Responses
| Status Code | Description | Example Message |
|---|---|---|
400 |
validation errors | "model is required""query is required""documents array is required and cannot be empty""documents array exceeds maximum of 1000 items (received X). Please reduce the number of documents per request" |
401 |
missing or invalid authorization token | authentication errors |
403 |
you don’t have access to the requested model | authorization errors |
404 |
invalid model ID | model not found errors |
429 |
rate limit exceeded | exceeded 250 RPM (Cohere) or 200 RPM (Amazon) |
500 |
internal server error | backend service errors |
Example Request
Let’s walk through an example /v1/rerank curl request.
First, use this command to set your Heroku environment variables as local variables.
export RERANK_MODEL_ID=$(heroku config:get -a $APP_NAME RERANK_MODEL_ID)
export RERANK_KEY=$(heroku config:get -a $APP_NAME RERANK_KEY)
export RERANK_URL=$(heroku config:get -a $APP_NAME RERANK_URL)
Next, send the curl request:
curl $RERANK_URL/v1/rerank \
-H "Authorization: Bearer $RERANK_KEY" \
-d @- <<EOF
{
"model": "$RERANK_MODEL_ID",
"query": "How do I optimize database connection pooling?",
"documents": [
"Connection pooling reduces overhead by reusing existing database connections instead of creating new ones for each request.",
"You can monitor application performance using built-in metrics and logging tools.",
"Set max pool size based on your dyno count and expected concurrent queries to prevent connection exhaustion.",
"Regular database backups are essential for disaster recovery planning."
],
"top_n": 2
}
EOF
Example Response
{
"id": "f844c7c3-c357-4476-9a9d-d2de06f2106f",
"results": [
{
"index": 0,
"relevance_score": 0.6740
},
{
"index": 2,
"relevance_score": 0.5308
}
],
"meta": {
"api_version": {
"version": "2",
"is_experimental": false
},
"billed_units": {
"search_units": 1
}
}
}