Odock.ai
Governance

Policies And Rate Limits

Configure IP policy, request limits, concurrency, payload caps, and token reservations.

Policies And Rate Limits

Rate limiting is enforced by odock-server in staged gates. Policies can be attached at several scopes and are resolved per request.

Policy Scopes

The resolver builds a policy snapshot in this order:

  1. Global policy from RATELIMIT_GLOBAL_POLICY.
  2. Organisation policy.
  3. Team policy.
  4. API key policy.
  5. Model policy.
  6. MCP server policy for MCP requests.

Each scope can contribute rate-limit rules and IP allow/block lists.

Policy Shape

{
  "policies": {
    "ip": {
      "allowlist": ["10.0.0.0/8"],
      "blocklist": ["203.0.113.0/24"]
    },
    "ratelimit": {
      "requests": {
        "per_second": 20,
        "per_minute": 1200,
        "burst": 50
      },
      "tokens": {
        "per_minute": 120000
      },
      "concurrency": {
        "max": 40,
        "lease_ttl_seconds": 30
      },
      "payload": {
        "max_request_bytes": 2097152,
        "max_tokens": 8192
      }
    }
  }
}

All numeric values must be non-negative. Omitted values mean no limit for that field.

Enforcement Stages

Pre-auth gate

Runs before API key authentication.

Purpose:

  • apply global/IP policy early;
  • block known bad clients before database/auth work.

Early gate

Runs after auth and initial policy resolution, before body decode and final model normalization.

Enforces:

  • payload size guardrail,
  • max token guardrail when known,
  • concurrent request leases,
  • request burst,
  • requests per second,
  • requests per minute.

Final gate

Runs after the request is decoded, model config is applied, and model class/provider/stream metadata is known.

Enforces:

  • tokens per minute reservation based on estimated input tokens plus requested max_tokens;
  • model or MCP scoped token policy.

Post-flight

Runs after the request completes or is rejected after reservation.

Reconciles:

  • concurrency lease release;
  • token reservation refund based on actual usage.

Deny Responses

Rate-limit blocks return HTTP 429 with reason metadata and retry information when available. Common deny codes include:

  • payload_size_unknown
  • payload_too_large
  • max_tokens_exceeded
  • concurrency_exceeded
  • burst_exceeded
  • rps_exceeded
  • rpm_exceeded
  • tpm_exceeded

Client IP Handling

The gateway computes client IP from the request. If HTTP_TRUST_PROXY_HEADERS=true, forwarded headers are trusted only for configured proxy CIDRs.

Configure:

HTTP_TRUST_PROXY_HEADERS=true
HTTP_TRUSTED_PROXY_CIDRS=172.16.0.0/12,192.168.0.0/16,10.0.0.0/8

Caching

Policy cache keys live in Redis. The UI can invalidate:

  • API key policy cache,
  • organisation policy cache,
  • team policy cache,
  • model policy cache,
  • MCP policy cache.

If cache invalidation is not configured, policy changes take effect after the cache TTL.

Practical Policy Examples

Strict API key:

{
  "policies": {
    "ratelimit": {
      "requests": { "per_minute": 60, "burst": 10 },
      "tokens": { "per_minute": 20000 },
      "concurrency": { "max": 5 },
      "payload": { "max_tokens": 4096, "max_request_bytes": 1048576 }
    }
  }
}

Internal network only:

{
  "policies": {
    "ip": {
      "allowlist": ["10.0.0.0/8", "192.168.0.0/16"],
      "blocklist": []
    }
  }
}

On this page