Native Models call

Native endpoints keep the provider family's API shape. Use them when you already have a provider-compatible client and want Odock to sit between that client and the upstream provider.

The native route determines the provider family. The model still names an Odock model configured in your organisation. At runtime, Odock resolves that model to its upstream slug, provider base URL, timeout, and encrypted provider key.

Native endpoint rules:

The request uses an Odock virtual API key.
The requested model must exist in the organisation.
The virtual API key must have access to the model.
The model's provider family must match the native endpoint.
Provider-specific payloads are preserved where the endpoint supports passthrough.

Endpoints by Provider

The tabs below show currently available or compatible provider endpoint shapes. They are examples of supported native integrations and can grow as new providers are added.

Method	Endpoint	Use for	SDK base URL	Streaming
`POST`	`/v1/chat/completions`	Chat Completions	`https://api.odock.ai/v1`	Yes
`POST`	`/v1/responses`	Responses API	`https://api.odock.ai/v1`	Yes
`POST`	`/v1/embeddings`	Embeddings	`https://api.odock.ai/v1`	No
`POST`	`/v1/images/generations`	Image generation	`https://api.odock.ai/v1`	No
`POST`	`/v1/images/edits`	Image editing multipart requests	`https://api.odock.ai/v1`	No
`POST`	`/v1/images/variations`	Image variation multipart requests	`https://api.odock.ai/v1`	No

OpenAI-compatible endpoints are the best fit when you want the smallest migration. Set the OpenAI SDK base_url to Odock and replace the upstream provider key with the Odock virtual API key.

Method	Endpoint	Use for	SDK base URL	Streaming
`POST`	`/v1/messages`	Anthropic Messages	`https://api.odock.ai`	Yes

The Anthropic SDK appends /v1/messages itself, so its base URL should not include /v1.

Method	Endpoint	Use for	Client base URL	Streaming
`POST`	`/v1beta/models/{model}:generateContent`	Gemini content generation	`https://api.odock.ai`	No
`POST`	`/v1beta/models/{model}:streamGenerateContent`	Gemini streaming generation	`https://api.odock.ai`	Yes

Gemini-compatible callers can authenticate with Authorization: Bearer ..., x-goog-api-key, or the ?key= query parameter. Headers are preferred for server-side applications.

Method	Endpoint	Use for	Client base URL	Streaming
`GET`	`/v1/vllm/models`	vLLM model listing	`https://api.odock.ai`	No
`POST`	`/v1/vllm/chat/completions`	vLLM chat completions	`https://api.odock.ai`	Yes
`POST`	`/v1/vllm/completions`	vLLM completions	`https://api.odock.ai`	Yes
`POST`	`/v1/vllm/responses`	vLLM responses	`https://api.odock.ai`	Yes
`POST`	`/v1/vllm/embeddings`	vLLM embeddings	`https://api.odock.ai`	No
`POST`	`/v1/vllm/audio/transcriptions`	vLLM audio transcriptions	`https://api.odock.ai`	No
`POST`	`/v1/vllm/audio/translations`	vLLM audio translations	`https://api.odock.ai`	No
`POST`	`/v1/vllm/tokenize`	vLLM tokenize	`https://api.odock.ai`	No
`POST`	`/v1/vllm/detokenize`	vLLM detokenize	`https://api.odock.ai`	No
`POST`	`/v1/vllm/pooling`	vLLM pooling	`https://api.odock.ai`	No
`POST`	`/v1/vllm/classify`	vLLM classify	`https://api.odock.ai`	No
`POST`	`/v1/vllm/score`	vLLM score	`https://api.odock.ai`	No
`POST`	`/v1/vllm/rerank`	vLLM rerank	`https://api.odock.ai`	No

vLLM endpoints forward raw vLLM-compatible payloads after Odock resolves the model and governance context.

Method	Endpoint	Use for	Client base URL	Streaming
`GET`	`/v1/mistral/models`	Mistral model listing	`https://api.odock.ai`	No
`POST`	`/v1/mistral/chat/completions`	Mistral chat completions	`https://api.odock.ai`	Yes
`POST`	`/v1/mistral/fim/completions`	Codestral fill-in-the-middle (code) completions	`https://api.odock.ai`	Yes
`POST`	`/v1/mistral/embeddings`	Mistral embeddings	`https://api.odock.ai`	No
`POST`	`/v1/mistral/moderations`	Mistral moderation / safety classification	`https://api.odock.ai`	No
`POST`	`/v1/mistral/ocr`	Mistral OCR: document or image to per-page markdown	`https://api.odock.ai`	No

Mistral chat completions and embeddings are OpenAI-shaped, so the OpenAI SDK works by pointing its base_url at https://api.odock.ai/v1/mistral. FIM, moderation, and OCR are Mistral-native: send raw JSON with the virtual API key as a bearer token. There is no Mistral Responses API, so bind chat to chat completions rather than the Responses endpoint.

OCR is billed per page from the usage_info.pages_processed count the upstream returns. See Model Pricing.

OpenAI-Compatible Calls

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["ODOCK_API_KEY"],
    base_url=os.environ.get("ODOCK_BASE_URL", "https://api.odock.ai/v1"),
)

response = client.chat.completions.create(
    model=os.environ.get("ODOCK_MODEL", "gpt-4.1-mini"),
    messages=[{"role": "user", "content": "Write a short status update."}],
    temperature=0.3,
    max_tokens=120,
)

print(response.choices[0].message.content)

curl "$ODOCK_BASE_URL/chat/completions" \
  -H "Authorization: Bearer $ODOCK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "'"$ODOCK_MODEL"'",
    "messages": [
      {"role": "user", "content": "Write a short status update."}
    ],
    "temperature": 0.3,
    "max_tokens": 120
  }'

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["ODOCK_API_KEY"],
    base_url=os.environ.get("ODOCK_BASE_URL", "https://api.odock.ai/v1"),
)

response = client.responses.create(
    model=os.environ.get("ODOCK_MODEL", "gpt-4.1-mini"),
    input=[{"role": "user", "content": "Summarize the gateway flow."}],
    temperature=0.3,
    max_output_tokens=160,
)

print(response.output_text)

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["ODOCK_API_KEY"],
    base_url=os.environ.get("ODOCK_BASE_URL", "https://api.odock.ai/v1"),
)

response = client.embeddings.create(
    model=os.environ.get("ODOCK_EMBEDDING_MODEL", "text-embedding-3-small"),
    input="Odock records usage for model traffic.",
    encoding_format="float",
)

print(len(response.data[0].embedding))

Anthropic-Compatible Calls

import os
from anthropic import Anthropic

client = Anthropic(
    api_key=os.environ["ODOCK_API_KEY"],
    base_url=os.environ.get("ODOCK_GATEWAY_URL", "https://api.odock.ai"),
)

message = client.messages.create(
    model=os.environ.get("ODOCK_MODEL", "claude-sonnet-4-5"),
    max_tokens=160,
    temperature=0.3,
    messages=[
        {"role": "user", "content": "Explain model access in Odock."}
    ],
)

print(message.content[0].text)

curl "${ODOCK_GATEWAY_URL:-https://api.odock.ai}/v1/messages" \
  -H "Authorization: Bearer $ODOCK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "'"${ODOCK_MODEL:-claude-sonnet-4-5}"'",
    "max_tokens": 160,
    "temperature": 0.3,
    "messages": [
      {"role": "user", "content": "Explain model access in Odock."}
    ]
  }'

Gemini-Compatible Calls

import os

from google import genai
from google.genai import types

client = genai.Client(
    api_key=os.environ["ODOCK_API_KEY"],
    http_options=types.HttpOptions(
        # The Google GenAI SDK appends /v1beta, so use the gateway root here.
        base_url=os.environ.get("ODOCK_GATEWAY_URL", "https://api.odock.ai"),
        api_version="v1beta",
    ),
)

response = client.models.generate_content(
    model=os.environ.get("ODOCK_MODEL", "gemini-2.5-flash"),
    contents="Hello from Gemini through my gateway",
)

print(response.text)

import json
import os
from typing import Any

import httpx


BASE_URL = os.getenv("ODOCK_GATEWAY_URL", "https://api.odock.ai").rstrip("/")
API_KEY = os.environ["ODOCK_API_KEY"]
MODEL = os.getenv("ODOCK_MODEL", "gemini-2.5-flash")
PROMPT = os.getenv("ODOCK_PROMPT", "Explain provider credentials in Odock.")


def gemini_payload(prompt: str) -> dict[str, Any]:
    return {
        "contents": [
            {
                "role": "user",
                "parts": [{"text": prompt}],
            }
        ],
        "generationConfig": {
            "temperature": 0.3,
            "maxOutputTokens": 160,
        },
    }


headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json",
}

url = f"{BASE_URL}/v1beta/models/{MODEL}:generateContent"

with httpx.Client(timeout=30.0) as client:
    response = client.post(url, headers=headers, json=gemini_payload(PROMPT))

response.raise_for_status()
data = response.json()

text_parts: list[str] = []
for candidate in data.get("candidates", []):
    content = candidate.get("content", {})
    for part in content.get("parts", []):
        text = part.get("text")
        if text:
            text_parts.append(text)

print("".join(text_parts).strip() or "<no text>")

usage = data.get("usageMetadata")
if usage:
    print("usage:", json.dumps(usage))

curl "${ODOCK_GATEWAY_URL:-https://api.odock.ai}/v1beta/models/${ODOCK_MODEL:-gemini-2.5-flash}:generateContent" \
  -H "Authorization: Bearer $ODOCK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [{"text": "Explain provider credentials in Odock."}]
      }
    ],
    "generationConfig": {
      "temperature": 0.3,
      "maxOutputTokens": 160
    }
  }'

curl -N "${ODOCK_GATEWAY_URL:-https://api.odock.ai}/v1beta/models/${ODOCK_MODEL:-gemini-2.5-flash}:streamGenerateContent" \
  -H "Authorization: Bearer $ODOCK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [{"text": "Give two examples of gateway routing."}]
      }
    ],
    "generationConfig": {
      "temperature": 0.3,
      "maxOutputTokens": 160
    }
  }'

vLLM-Compatible Calls

curl "${ODOCK_GATEWAY_URL:-https://api.odock.ai}/v1/vllm/chat/completions" \
  -H "Authorization: Bearer $ODOCK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "'"${ODOCK_MODEL:-llama-3.1-8b-instruct}"'",
    "messages": [
      {"role": "user", "content": "Explain quota enforcement."}
    ],
    "max_tokens": 160,
    "stream": false
  }'

curl "${ODOCK_GATEWAY_URL:-https://api.odock.ai}/v1/vllm/models" \
  -H "Authorization: Bearer $ODOCK_API_KEY"

curl "${ODOCK_GATEWAY_URL:-https://api.odock.ai}/v1/vllm/embeddings" \
  -H "Authorization: Bearer $ODOCK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "'"${ODOCK_EMBEDDING_MODEL:-bge-small-en}"'",
    "input": "Embeddings traffic is governed by Odock."
  }'

Mistral-Compatible Calls

Chat and embeddings are OpenAI-shaped, so the OpenAI SDK works by pointing base_url at /v1/mistral. FIM, moderation, and OCR are Mistral-native and are called with raw JSON.

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["ODOCK_API_KEY"],
    base_url=os.environ.get("ODOCK_MISTRAL_BASE_URL", "https://api.odock.ai/v1/mistral"),
)

# Bind to chat completions; Mistral has no Responses API.
response = client.chat.completions.create(
    model=os.environ.get("ODOCK_MODEL", "mistral-small-latest"),
    messages=[{"role": "user", "content": "Summarize the gateway in one sentence."}],
    max_tokens=120,
    stream=False,
)

print(response.choices[0].message.content)

curl "${ODOCK_GATEWAY_URL:-https://api.odock.ai}/v1/mistral/fim/completions" \
  -H "Authorization: Bearer $ODOCK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "'"${ODOCK_CODE_MODEL:-codestral-latest}"'",
    "prompt": "def fibonacci(n):\n    ",
    "suffix": "\n    return result",
    "max_tokens": 128
  }'

curl "${ODOCK_GATEWAY_URL:-https://api.odock.ai}/v1/mistral/moderations" \
  -H "Authorization: Bearer $ODOCK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "'"${ODOCK_MODERATION_MODEL:-mistral-moderation-latest}"'",
    "input": "Text to classify for safety categories."
  }'

curl "${ODOCK_GATEWAY_URL:-https://api.odock.ai}/v1/mistral/ocr" \
  -H "Authorization: Bearer $ODOCK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "'"${ODOCK_OCR_MODEL:-mistral-ocr-latest}"'",
    "document": {
      "type": "document_url",
      "document_url": "https://arxiv.org/pdf/2201.04234"
    }
  }'

The OCR response returns extracted markdown per page plus usage_info.pages_processed. To OCR an image instead of a document, send "type": "image_url" with an "image_url". OCR is billed per page, not per token.

Native Error Behavior

Native endpoints can return gateway-controlled JSON errors, plain-text transport errors, or provider-native upstream errors after Odock accepts the request. The most common native-specific case is 400 provider_not_allowed, which means the model is configured for a different provider family than the endpoint.

For the full format and status-code tables, see Gateway Errors.

Native Models call

On this page