ODOCK.AI
Security & GuardrailsGuardrails

Ratelimit Modules

Understand how request-aware, network-aware, and token-aware ratelimit modules fit into guardrail gates.

Ratelimit Modules

Ratelimit modules are the low-level guardrail primitives used by the gateway to answer traffic-shape questions. They are not UI widgets and they are not generic plugins. They live in the ratelimit engine and are injected into lifecycle gates where the request has the right amount of context.

Module familyWhat it answersTypical gate
IP/networkIs this request coming from an allowed or blocked network?Pre-auth or early gate
RPS/RPM/burstIs this scope receiving too many requests?Early gate
ConcurrencyDoes this scope already have too much in-flight work?Early gate
TPMWould this request exceed the token-per-minute envelope?Final gate
OverlaysShould a scoped policy be temporarily scaled?Early and final gates

The important design point is that modules are intentionally small. A module should do one kind of accounting or decision well. The stage decides when to call it, how to interpret its result, how to handle shadow mode, and whether the request should continue.

Why Modules Are Injected Into Gates

Different guardrails need different request knowledge.

A network module can run before the gateway knows the model because it only needs origin information. A request-rate module needs the caller scope so it can apply organisation, team, API key, model, or MCP limits. A token module should not run until the gateway can estimate the token envelope. A reconciliation module only makes sense after the request has completed.

This separation keeps the system both efficient and explainable:

  • cheap guardrails run before expensive work
  • request-aware modules run before upstream calls
  • token-aware modules run after token context exists
  • post-flight modules clean up runtime accounting
  • each denial can be tied back to a scope, limit name, retry window, and reason

Gate Modes

When you add a ratelimit module, decide which gate mode it belongs to.

ModeUse forExamples
Pre-authNetwork or platform-wide checks that do not need a valid API key.global IP allow/block
EarlyRequest-aware checks after authentication, before heavy upstream work.payload bytes, RPS, RPM, burst, concurrency
FinalToken-aware checks after decoding and resource resolution.max tokens, tokens per minute
Post-flightCleanup or reconciliation after the request finishes.release concurrency, reconcile token usage

Do not put a module in an earlier gate just to block faster. If the module needs model, MCP, or token context, it belongs later. A fast wrong decision is worse than a slightly later correct decision.

Request-Aware Modules

Request-aware modules use fields from the normalized RequestContext: request id, path, method, content length, organisation id, team id, API key id, provider, model, and stream flag.

These modules are useful for limits that can be evaluated before the upstream call:

  • payload too large
  • requests per second
  • requests per minute
  • burst
  • concurrent requests
  • model or MCP scoped request pressure

They are normally injected into the early gate because authentication and caller scope are already known, but token usage may not be fully known yet.

Network-Aware Modules

Network-aware modules focus on the client origin. They should be deterministic and cheap because they are often the first gate that can reject bad traffic.

Use them for:

  • IP allowlists
  • IP blocklists
  • CIDR-style network boundaries
  • organisation or key network policy checks after caller scope is known

Network checks can exist in more than one gate. A broad platform or deployment-level network rule can run before auth. A scoped rule that depends on the organisation, team, API key, model, or MCP server runs after those scopes are known.

Token-Aware Modules

Token-aware modules protect LLM capacity and cost-sensitive workloads. They should run only after the gateway can reason about the requested model and token envelope.

Use them for:

  • tokens per minute
  • max tokens per request
  • token-aware model policies
  • token accounting that must be reconciled after the upstream response

Token-aware modules are why Odock treats guardrails as more than simple request counting. Two requests can have the same HTTP shape and very different token impact.

Module And Stage Responsibilities

Keep this boundary clear:

LayerResponsibility
ModuleMaintain one accounting primitive or decision primitive.
Stage/gateChoose when to call modules, apply scoped policy, produce decisions, respect shadow mode, and manage receipts.
Policy resolverProvide the effective scoped policy snapshot.
Post-flightRelease or reconcile runtime accounting after the request.

This is the reason guardrails or ratelimit modules do not look like generic plugins. They are purpose-built primitives that the staged ratelimit engine composes.

Continue with Custom guardrails for the implementation workflow.

On this page