Gateway Request Lifecycle

All LLM-style endpoints share a common lifecycle implemented by the gateway endpoint handler.

Middleware Order

The HTTP server wraps the router with:

Request ID middleware.
Telemetry middleware.
Recovery middleware.
Logging middleware.

The effective runtime behavior is:

request ID is read or generated,
route and stream metadata are attached to context,
root HTTP span is started,
request logs are emitted,
panics are recovered into 500 responses,
request duration and status are recorded.

LLM Endpoint Flow

sequenceDiagram
  participant Client
  participant Gateway
  participant Redis
  participant Postgres
  participant Provider

  Client->>Gateway: HTTP request
  Gateway->>Gateway: Request ID, logging, telemetry
  Gateway->>Redis: Pre-auth rate-limit/IP check
  Gateway->>Gateway: SafetySec pre_auth
  Gateway->>Postgres: API key lookup on cache miss
  Gateway->>Redis: Auth/model/policy cache warm
  Gateway->>Gateway: SafetySec post_auth
  Gateway->>Redis: Resolve policy and early gate
  Gateway->>Gateway: Decode request
  Gateway->>Gateway: Plugins pre_route
  Gateway->>Redis: Model access cache
  Gateway->>Postgres: Model/provider lookup on cache miss
  Gateway->>Gateway: Apply provider key and upstream config
  Gateway->>Redis: Final token gate
  Gateway->>Postgres: Budget/quota reserve
  Gateway->>Gateway: SafetySec pre_upstream
  Gateway->>Gateway: Plugins pre_upstream
  Gateway->>Provider: Upstream call or stream
  Provider-->>Gateway: Response
  Gateway->>Gateway: SafetySec post_upstream
  Gateway->>Gateway: Plugins post_upstream
  Gateway-->>Client: Response or stream
  Gateway->>Redis: Rate-limit post-flight
  Gateway->>Redis: Usage collector enqueue
  Gateway->>Postgres: Usage flush and budget settle
  Gateway->>Gateway: SafetySec post_response / stream
  Gateway->>Gateway: Plugins post_response

Important Ordering Details

Pre-auth rate limiting runs before API key lookup.
Model access is checked before provider credentials are applied.
Required provider endpoints reject requests that try to force another provider.
Budget reservation happens before SafetySec pre_upstream and plugin pre_upstream; if those later checks block, reservations are released.
Unary usage is recorded after response write.
Streaming usage is recorded after stream completion.
Rate-limit post-flight runs even when later stages fail, as long as a receipt exists.

Provider Config Application

For model-based requests, the gateway:

Loads the model by organisation and user-facing name.
Rewrites upstream model from name to slug when needed.
Validates provider presence and enabled status.
Maps provider type to runtime provider name.
Applies base URL and timeout.
Resolves provider key plaintext from encrypted envelope or legacy value.

Common model config errors return specific gateway error codes:

model_not_found
model_provider_missing
model_provider_disabled
model_provider_key_missing
model_provider_key_revoked
model_provider_type_unsupported

Streaming endpoints call router.Stream, wrap the response body for stream metrics, and then write provider-compatible SSE output. Stream-specific usage is read from StreamResponse.Usage when available.

If a client disconnects, the gateway records the stream close but avoids writing a second response because stream headers may already be sent.

Request Identity

The request ID can come from inbound headers or be generated. It is propagated to:

response headers,
context,
logs,
traces,
usage records,
rate-limit receipts,
budget reservations.

Gateway Request Lifecycle

Gateway Request Lifecycle

Middleware Order

LLM Endpoint Flow

Important Ordering Details

Provider Config Application

Streaming

Request Identity

On this page