ODOCK.AI
ManagementQuotas

Quota metrics

Understand the metrics that quotas can limit.

Quota metrics

Each quota limits one metric.

MetricMeaningCommon use
REQUESTSNumber of gateway requests.Limit request volume.
TOKENSTotal tokens.Limit total model usage.
TOKENS_INInput tokens.Limit prompt or input volume.
TOKENS_OUTOutput tokens.Limit generated output.
ERRORSError count.Stop noisy or failing workloads.
LATENCY_MSLatency value.Track or constrain latency-related consumption depending on deployment behavior.

Metric Selection

Use REQUESTS for traffic envelopes.

Use TOKENS, TOKENS_IN, or TOKENS_OUT for model usage envelopes.

Use ERRORS for broken workflows that should stop after repeated failures.

Usage Records

Quota settlement is based on request-level usage evidence: status, token counts, latency, and cost. See Usage Monitoring.

On this page