ODOCK.AI
Usage

Usage

How to call the Odock gateway, choose endpoint families, authenticate requests, and handle gateway errors.

Usage

Odock is a runtime gateway for LLM and MCP traffic. Applications send requests to Odock with a virtual API key. Odock authenticates that key, checks model or MCP access, resolves the configured upstream provider, injects the encrypted provider credential, applies routing, budgets, quotas, rate limits, SafetySec, plugins, observability, and usage recording, then returns the response in the endpoint shape the client selected.

Use the gateway in one of two ways:

  • Native provider endpoints keep the request and response shape of a provider family. The examples below cover the currently available or compatible provider shapes, such as OpenAI, Anthropic, Gemini, and vLLM.
  • Unified endpoint uses Odock's provider-neutral /v1/llm/chat surface. It accepts an OpenAI-compatible chat request shape and can route across configured providers when routing is enabled.

The Quick Start shows the shortest OpenAI-compatible migration. This page is the detailed reference for gateway usage.

Base URL

DeploymentBase URL
Odock Cloudhttps://api.odock.ai/v1
Local self-hosted gatewayhttp://localhost:8080/v1
Custom self-hosted domainyour own /v1 gateway URL, for example https://ai-gateway.example.com/v1

Some native clients expect the base URL without /v1:

Client familyRecommended base URL
OpenAI SDKhttps://api.odock.ai/v1
Anthropic SDKhttps://api.odock.ai
Gemini HTTP clientshttps://api.odock.ai
vLLM HTTP clientshttps://api.odock.ai plus /v1/vllm/... paths

Authentication

Use a virtual API key created in Odock, not an upstream provider key.

Odock accepts these credential forms:

Credential formTypical use
Authorization: Bearer sk_your_dock_virtual_keyRecommended for OpenAI, Anthropic, vLLM, and direct HTTP callers.
X-API-Key: sk_your_dock_virtual_keyGeneric API-key callers.
x-goog-api-key: sk_your_dock_virtual_keyGemini-compatible clients.
?key=sk_your_dock_virtual_keyGemini-compatible query parameter support. Prefer headers for server-side applications.

The virtual key must be active, unexpired, and explicitly granted access to the requested model or MCP server.

Runtime Resolution

For LLM endpoints, the gateway resolves the requested model against the organisation model table. The configured model record supplies:

  • the client-facing model name
  • the upstream provider model slug
  • the provider type
  • the provider base URL
  • the provider timeout
  • the encrypted provider API key
  • pricing, capabilities, policies, budgets, quotas, and routing metadata

The model name in the request is the Odock model name. Odock rewrites it to the upstream slug before calling the provider when the configured slug differs.

For native endpoints, the route pins the provider family. For example, /v1/chat/completions only allows OpenAI-backed models. If the model record resolves to another provider family, the gateway rejects the request with provider_not_allowed.

For /v1/llm/chat, the endpoint is provider-neutral. It accepts OpenAI-compatible chat fields such as model, messages, temperature, max_tokens, tools, tool_choice, response_format, and stream. Streaming responses are emitted as OpenAI-compatible chat.completion.chunk SSE events. Unary responses use Odock's normalized response object with fields such as provider, model, content, stop_reason, content_blocks, tool_calls, input_tokens, and output_tokens.

Choose an Endpoint

The provider tabs below document the endpoint shapes currently available or compatible in the gateway. They are examples of supported provider families, not a permanent limit on future providers.

MethodEndpointRequest shapeResponse shapeStreaming
POST/v1/llm/chatOpenAI-compatible chat request fields plus optional providerOdock normalized JSON for unary; OpenAI-compatible SSE chunks for streamingYes

Use /v1/llm/chat when you want one chat endpoint that can call models configured in Odock across available or compatible providers. It is the main endpoint for provider-neutral routing and multi-model failover.

MethodEndpointRequest shapeResponse shapeStreaming
POST/v1/chat/completionsOpenAI Chat CompletionsOpenAI Chat CompletionsYes
POST/v1/responsesOpenAI ResponsesOpenAI ResponsesYes
POST/v1/embeddingsOpenAI EmbeddingsOpenAI EmbeddingsNo
POST/v1/images/generationsOpenAI ImagesOpenAI ImagesNo
POST/v1/images/editsOpenAI Images multipart editOpenAI ImagesNo
POST/v1/images/variationsOpenAI Images multipart variationOpenAI ImagesNo

Use these endpoints when your application already uses an OpenAI-compatible SDK or HTTP client.

MethodEndpointRequest shapeResponse shapeStreaming
POST/v1/messagesAnthropic MessagesAnthropic MessagesYes

Use this endpoint with Anthropic SDKs or clients that expect Anthropic's native messages API.

MethodEndpointRequest shapeResponse shapeStreaming
POST/v1beta/models/{model}:generateContentGemini generateContentGemini generateContentNo
POST/v1beta/models/{model}:streamGenerateContentGemini streamGenerateContentGemini SSEYes

Use these endpoints for Gemini-compatible HTTP clients. The {model} path segment is still the Odock model name, not necessarily the upstream Gemini slug.

MethodEndpointRequest shapeResponse shapeStreaming
GET/v1/vllm/modelsvLLM modelsvLLM modelsNo
POST/v1/vllm/chat/completionsvLLM chat completionsvLLM raw responseYes
POST/v1/vllm/completionsvLLM completionsvLLM raw responseYes
POST/v1/vllm/responsesvLLM responsesvLLM raw responseYes
POST/v1/vllm/embeddingsvLLM embeddingsvLLM raw responseNo
POST/v1/vllm/audio/transcriptionsvLLM audio transcriptionsvLLM raw responseNo
POST/v1/vllm/audio/translationsvLLM audio translationsvLLM raw responseNo
POST/v1/vllm/tokenizevLLM tokenizevLLM raw responseNo
POST/v1/vllm/detokenizevLLM detokenizevLLM raw responseNo
POST/v1/vllm/poolingvLLM poolingvLLM raw responseNo
POST/v1/vllm/classifyvLLM classifyvLLM raw responseNo
POST/v1/vllm/scorevLLM scorevLLM raw responseNo
POST/v1/vllm/rerankvLLM rerankvLLM raw responseNo

Use these endpoints when the upstream model is configured with a vLLM provider and the client expects vLLM-compatible payloads.

MethodEndpointRequest shapeResponse shapeStreaming
GET or POST/v1/mcp/{slug}MCP transport payloadProxied MCP responseDepends on server transport
GET or POST/v1/mcp/{id}MCP transport payloadProxied MCP responseDepends on server transport
GET or POST/v1/mcp/{slug}/{path}MCP transport payload with additional pathProxied MCP responseDepends on server transport

MCP calls resolve an MCP server by slug or id, verify ApiKeyMcpAccess, apply MCP guardrails and budgets, then proxy to the configured STREAMABLE_HTTP, SSE, or STDIO transport.

Calling Methods

Use the method that fits your application. Each example uses the same virtual API key and the same configured Odock model.

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["ODOCK_API_KEY"],
    base_url=os.environ.get("ODOCK_BASE_URL", "https://api.odock.ai/v1"),
)

response = client.chat.completions.create(
    model=os.environ.get("ODOCK_MODEL", "gpt-4.1-mini"),
    messages=[{"role": "user", "content": "Explain Odock in one sentence."}],
    temperature=0.2,
    max_tokens=120,
)

print(response.choices[0].message.content)
curl "$ODOCK_BASE_URL/chat/completions" \
  -H "Authorization: Bearer $ODOCK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "'"$ODOCK_MODEL"'",
    "messages": [
      {"role": "user", "content": "Explain Odock in one sentence."}
    ],
    "temperature": 0.2,
    "max_tokens": 120
  }'
import os
import httpx

base_url = os.environ.get("ODOCK_BASE_URL", "https://api.odock.ai/v1")
api_key = os.environ["ODOCK_API_KEY"]
model = os.environ.get("ODOCK_MODEL", "gpt-4.1-mini")

response = httpx.post(
    f"{base_url}/chat/completions",
    headers={
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json",
    },
    json={
        "model": model,
        "messages": [
            {"role": "user", "content": "Explain Odock in one sentence."}
        ],
        "temperature": 0.2,
        "max_tokens": 120,
    },
    timeout=60.0,
)
response.raise_for_status()
print(response.json()["choices"][0]["message"]["content"])

Streaming

Streaming remains available on chat-style OpenAI, Anthropic, Gemini, vLLM, and unified endpoints. Use streaming when your client can consume server-sent events.

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["ODOCK_API_KEY"],
    base_url=os.environ.get("ODOCK_BASE_URL", "https://api.odock.ai/v1"),
)

stream = client.chat.completions.create(
    model=os.environ.get("ODOCK_MODEL", "gpt-4.1-mini"),
    messages=[{"role": "user", "content": "Give three gateway benefits."}],
    stream=True,
    max_tokens=180,
    extra_body={"include_usage": True},
)

for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.content:
        print(delta.content, end="", flush=True)

print()
curl -N "$ODOCK_BASE_URL/chat/completions" \
  -H "Authorization: Bearer $ODOCK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "'"$ODOCK_MODEL"'",
    "messages": [
      {"role": "user", "content": "Give three gateway benefits."}
    ],
    "stream": true,
    "include_usage": true,
    "max_tokens": 180
  }'
import json
import os
import httpx

base_url = os.environ.get("ODOCK_BASE_URL", "https://api.odock.ai/v1")
api_key = os.environ["ODOCK_API_KEY"]
model = os.environ.get("ODOCK_MODEL", "claude-sonnet-4-5")

payload = {
    "model": model,
    "messages": [{"role": "user", "content": "Give three gateway benefits."}],
    "stream": True,
    "include_usage": True,
    "max_tokens": 180,
}

with httpx.stream(
    "POST",
    f"{base_url}/llm/chat",
    headers={
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json",
    },
    json=payload,
    timeout=60.0,
) as response:
    response.raise_for_status()
    for line in response.iter_lines():
        if not line or not line.startswith("data: "):
            continue
        data = line[6:].strip()
        if data == "[DONE]":
            break
        chunk = json.loads(data)
        delta = chunk["choices"][0].get("delta", {})
        if delta.get("content"):
            print(delta["content"], end="", flush=True)

print()

Errors

Gateway-controlled errors usually return an error.code and error.message JSON body. Rate-limit responses may also include scope, limit, and retry metadata. Native provider endpoints can return provider-native upstream errors after Odock accepts the request.

See Gateway Errors for the complete error format, LLM error table, and MCP error table.

Operational Checklist

Before sending production traffic through Odock:

  1. Create the provider and provider key.
  2. Create models with the client-facing name and upstream slug.
  3. Grant the virtual API key access to each model it can call.
  4. Configure budgets, quotas, rate limits, and routing policies where needed.
  5. Run one unary call and one streaming call.
  6. Confirm the request appears in usage and observability views with provider, model, status, latency, tokens, cost, and routing attempts.

For provider-specific examples, continue with Native Models call. For the provider-neutral chat endpoint, continue with Unified Multi Model Endpoint call.

On this page