Quota metrics

Each quota limits one metric.

Metric	Meaning	Common use
`REQUESTS`	Number of gateway requests.	Limit request volume.
`TOKENS`	Total tokens.	Limit total model usage.
`TOKENS_IN`	Input tokens.	Limit prompt or input volume.
`TOKENS_OUT`	Output tokens.	Limit generated output.
`ERRORS`	Error count.	Stop noisy or failing workloads.
`LATENCY_MS`	Latency value.	Track or constrain latency-related consumption depending on deployment behavior.

Metric Selection

Use REQUESTS for traffic envelopes.

Use TOKENS, TOKENS_IN, or TOKENS_OUT for model usage envelopes.

Use ERRORS for broken workflows that should stop after repeated failures.

Quota settlement is based on request-level usage evidence: status, token counts, latency, and cost. See Usage Monitoring.