Understand how request-aware, network-aware, and token-aware ratelimit modules fit into guardrail gates.

Ratelimit Modules

Ratelimit modules are the low-level guardrail primitives used by the gateway to answer traffic-shape questions. They are not UI widgets and they are not generic plugins. They live in the ratelimit engine and are injected into lifecycle gates where the request has the right amount of context.

Module family	What it answers	Typical gate
IP/network	Is this request coming from an allowed or blocked network?	Pre-auth or early gate
RPS/RPM/burst	Is this scope receiving too many requests?	Early gate
Concurrency	Does this scope already have too much in-flight work?	Early gate
TPM	Would this request exceed the token-per-minute envelope?	Final gate
Overlays	Should a scoped policy be temporarily scaled?	Early and final gates

The important design point is that modules are intentionally small. A module should do one kind of accounting or decision well. The stage decides when to call it, how to interpret its result, how to handle shadow mode, and whether the request should continue.

Why Modules Are Injected Into Gates

Different guardrails need different request knowledge.

A network module can run before the gateway knows the model because it only needs origin information. A request-rate module needs the caller scope so it can apply organisation, team, API key, model, or MCP limits. A token module should not run until the gateway can estimate the token envelope. A reconciliation module only makes sense after the request has completed.

This separation keeps the system both efficient and explainable:

cheap guardrails run before expensive work
request-aware modules run before upstream calls
token-aware modules run after token context exists
post-flight modules clean up runtime accounting
each denial can be tied back to a scope, limit name, retry window, and reason

Gate Modes

When you add a ratelimit module, decide which gate mode it belongs to.

Mode	Use for	Examples
Pre-auth	Network or platform-wide checks that do not need a valid API key.	global IP allow/block
Early	Request-aware checks after authentication, before heavy upstream work.	payload bytes, RPS, RPM, burst, concurrency
Final	Token-aware checks after decoding and resource resolution.	max tokens, tokens per minute
Post-flight	Cleanup or reconciliation after the request finishes.	release concurrency, reconcile token usage

Do not put a module in an earlier gate just to block faster. If the module needs model, MCP, or token context, it belongs later. A fast wrong decision is worse than a slightly later correct decision.

Request-Aware Modules

Request-aware modules use fields from the normalized RequestContext: request id, path, method, content length, organisation id, team id, API key id, provider, model, and stream flag.

These modules are useful for limits that can be evaluated before the upstream call:

payload too large
requests per second
requests per minute
burst
concurrent requests
model or MCP scoped request pressure

They are normally injected into the early gate because authentication and caller scope are already known, but token usage may not be fully known yet.

Network-Aware Modules

Network-aware modules focus on the client origin. They should be deterministic and cheap because they are often the first gate that can reject bad traffic.

Use them for:

IP allowlists
IP blocklists
CIDR-style network boundaries
organisation or key network policy checks after caller scope is known

Network checks can exist in more than one gate. A broad platform or deployment-level network rule can run before auth. A scoped rule that depends on the organisation, team, API key, model, or MCP server runs after those scopes are known.

Token-Aware Modules

Token-aware modules protect LLM capacity and cost-sensitive workloads. They should run only after the gateway can reason about the requested model and token envelope.

Use them for:

tokens per minute
max tokens per request
token-aware model policies
token accounting that must be reconciled after the upstream response

Token-aware modules are why Odock treats guardrails as more than simple request counting. Two requests can have the same HTTP shape and very different token impact.

Module And Stage Responsibilities

Keep this boundary clear:

Layer	Responsibility
Module	Maintain one accounting primitive or decision primitive.
Stage/gate	Choose when to call modules, apply scoped policy, produce decisions, respect shadow mode, and manage receipts.
Policy resolver	Provide the effective scoped policy snapshot.
Post-flight	Release or reconcile runtime accounting after the request.

This is the reason guardrails or ratelimit modules do not look like generic plugins. They are purpose-built primitives that the staged ratelimit engine composes.

Continue with Custom guardrails for the implementation workflow.

Ratelimit Modules

Ratelimit Modules

Why Modules Are Injected Into Gates

Gate Modes

Request-Aware Modules

Network-Aware Modules

Token-Aware Modules

Module And Stage Responsibilities

On this page