Review alerts and pipeline health

Use this tutorial when the question is no longer "what happened to one request" but "what is unhealthy in the deployment right now".

Open the alerting view your deployment uses.

This may be Grafana alerting, Alertmanager, or both depending on how your company exposes the stack.

Group the active alerts by family before you investigate.

The common families are:

Open the first matching dashboard for the alert family.

Use this mapping:

Alert family	Dashboard
Infrastructure	Infrastructure or Containers dashboards
OTel pipeline	Traces -> OTel Pipeline Health
Gateway request health	Gateway Request Dashboard
Provider health	Provider Dashboard
Logger pipeline	Logger Health Dashboard
Usage collector	Usage / Budget Dashboard plus Redis and Postgres health
Token anomalies	Token Usage Dashboard

Check whether the issue is signal loss or real request degradation.

For example:

Capture the evidence before escalating or acting.

Record:

Ownership Guide

Alert family	Usually owned by
Infrastructure	Platform or SRE team
OTel pipeline	Platform or observability owner
Gateway request and provider health	Platform team, often with provider escalation
Token anomalies	Platform owner and organisation owner together

If you only have read-only access, stop at evidence collection and hand the incident to the deployment owner.