Service Reliability
Logs
We use currently logging via segment.io
- Loki = logs. LogQL
Observability dashboards
We use currently datadog for service usage metrics
- Grafana = visualization, alerts
Metrics
We do not have observability of custom service metrics.
- HTTP transaction
- Latency
- Throughput
- Prometheus → Mimir = metrics. Cardinality. PromQL
Tracing
- OpenTelemetry
- Tempo = traces.