ODOCK.AI
Usage

Unified Multi Model Endpoint call

Use /v1/llm/chat as Odock's provider-neutral, OpenAI-compatible chat gateway endpoint.

Unified Multi Model Endpoint call

Use /v1/llm/chat when you want one gateway endpoint for chat requests across models backed by currently available or compatible providers.

This endpoint accepts an OpenAI-compatible chat request shape:

  • model
  • messages
  • temperature
  • max_tokens
  • top_p
  • tools
  • tool_choice
  • response_format
  • metadata
  • stream

It is provider-neutral. If provider is omitted, Odock resolves the provider from the configured model and any active routing policy. If provider is present, it can pin the request to openai, anthropic, gemini, or vllm when that matches the selected model configuration.

For streaming, /v1/llm/chat returns OpenAI-compatible chat.completion.chunk SSE events, regardless of the upstream provider. For unary responses, it returns Odock's normalized JSON response.

Endpoint

MethodPathRequest shapeUnary response shapeStreaming response shape
POST/v1/llm/chatOpenAI-compatible chat request fieldsOdock normalized responseOpenAI-compatible SSE chunks

Minimal Request

curl "$ODOCK_BASE_URL/llm/chat" \
  -H "Authorization: Bearer $ODOCK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "'"${ODOCK_MODEL:-claude-sonnet-4-5}"'",
    "messages": [
      {"role": "user", "content": "Explain budget enforcement."}
    ],
    "temperature": 0.2,
    "max_tokens": 200
  }'
import os
import httpx

base_url = os.environ.get("ODOCK_BASE_URL", "https://api.odock.ai/v1")
api_key = os.environ["ODOCK_API_KEY"]
model = os.environ.get("ODOCK_MODEL", "claude-sonnet-4-5")

response = httpx.post(
    f"{base_url}/llm/chat",
    headers={
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json",
    },
    json={
        "model": model,
        "messages": [
            {"role": "user", "content": "Explain budget enforcement."}
        ],
        "temperature": 0.2,
        "max_tokens": 200,
    },
    timeout=60.0,
)

response.raise_for_status()
data = response.json()
print(data["content"])
{
  "model": "claude-sonnet-4-5",
  "messages": [
    {
      "role": "user",
      "content": "Explain budget enforcement."
    }
  ],
  "temperature": 0.2,
  "max_tokens": 200
}

Normalized Unary Response

A successful non-streaming response uses Odock's normalized provider response shape:

{
  "provider": "anthropic",
  "model": "claude-sonnet-4-5",
  "content": "Budget enforcement reserves expected usage before the upstream call...",
  "stop_reason": "end_turn",
  "input_tokens": 24,
  "output_tokens": 48
}

Optional fields can include:

FieldMeaning
content_blocksStructured text, image, audio, refusal, or image-output blocks when the provider returns them.
tool_callsNormalized tool/function calls.
input_tokensInput token count reported by the provider or stream adapter.
output_tokensOutput token count reported by the provider or stream adapter.
stop_reasonProvider stop reason normalized into Odock's response object.

Streaming

For streaming, Odock writes server-sent events compatible with OpenAI chat completion chunks. This lets clients consume Anthropic or Gemini-backed streams with the same SSE loop they use for OpenAI chat streams.

curl -N "$ODOCK_BASE_URL/llm/chat" \
  -H "Authorization: Bearer $ODOCK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "'"${ODOCK_MODEL:-gemini-2.5-flash}"'",
    "messages": [
      {"role": "user", "content": "Give three routing examples."}
    ],
    "stream": true,
    "include_usage": true,
    "max_tokens": 200
  }'
import json
import os
import httpx

base_url = os.environ.get("ODOCK_BASE_URL", "https://api.odock.ai/v1")
api_key = os.environ["ODOCK_API_KEY"]
model = os.environ.get("ODOCK_MODEL", "gemini-2.5-flash")

with httpx.stream(
    "POST",
    f"{base_url}/llm/chat",
    headers={
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json",
    },
    json={
        "model": model,
        "messages": [
            {"role": "user", "content": "Give three routing examples."}
        ],
        "stream": True,
        "include_usage": True,
        "max_tokens": 200,
    },
    timeout=60.0,
) as response:
    response.raise_for_status()

    for line in response.iter_lines():
        if not line or not line.startswith("data: "):
            continue

        data = line[6:].strip()
        if data == "[DONE]":
            break

        chunk = json.loads(data)
        delta = chunk["choices"][0].get("delta", {})
        if delta.get("content"):
            print(delta["content"], end="", flush=True)

print()

Provider Selection

By default, send only the model and let Odock resolve the provider from the model record:

{
  "model": "claude-sonnet-4-5",
  "messages": [{"role": "user", "content": "Hello"}]
}

If you need to pin the provider explicitly, include provider:

{
  "provider": "anthropic",
  "model": "claude-sonnet-4-5",
  "messages": [{"role": "user", "content": "Hello"}]
}

Use provider pinning sparingly. In most applications, the model configuration and routing policy should decide the provider.

Smart Routing

When organisation routing is enabled and the virtual API key has a routing policy, /v1/llm/chat can try candidate models across provider families. Candidate models must be configured in the same organisation and must be accessible to the API key.

Typical routing policies use:

StrategyBehavior
failover or next candidateTry the next candidate when the current one fails with a configured failover trigger.
priorityTry lower-priority-number candidates first.
round_robinRotate traffic across equivalent candidates per API key.

Default failover triggers are 5xx and timeout. Policies can also include rate_limit or any.

For policy setup, see Routing.

Tools and Structured Content

The unified request supports normalized tool fields:

{
  "model": "gpt-4.1-mini",
  "messages": [
    {"role": "user", "content": "Call the weather tool for Paris."}
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get the weather for a city.",
        "parameters": {
          "type": "object",
          "properties": {
            "city": {"type": "string"}
          },
          "required": ["city"]
        }
      }
    }
  ],
  "tool_choice": {
    "type": "function",
    "function": {"name": "get_weather"}
  }
}

For multimodal inputs, use content_blocks on messages when you need structured text, image, or audio blocks:

{
  "model": "gemini-2.5-flash",
  "messages": [
    {
      "role": "user",
      "content": "Describe this image.",
      "content_blocks": [
        {
          "type": "image_url",
          "image_url": {
            "url": "data:image/png;base64,..."
          }
        }
      ]
    }
  ]
}

Provider support depends on the upstream model and configured provider.

Unified Errors

The unified endpoint can return model lookup, model access, provider configuration, budget, quota, rate-limit, SafetySec, plugin, and upstream provider errors. Gateway-controlled errors usually use a JSON error.code and error.message body.

For the full format and status-code tables, see Gateway Errors.

Use /v1/llm/chat when you want one OpenAI-compatible chat request surface with Odock-managed provider resolution. Use Native Models call when a client must preserve a provider-specific response shape.

On this page