Observability
LowLatencyPubSub is a real-time system. In production you need a way to answer:
- “Are connections healthy?”
- “Are we dropping messages?”
- “Which tenant/channel is hot?”
- “Why does a client disconnect?”
This page lists a practical baseline.
Metrics (recommended)
Track at least:
- connections: current connections, connect rate, disconnect rate
- subscriptions: subscribe/unsubscribe rate, active subscriptions
- publish rate: messages/s and bytes/s in
- delivery rate: messages/s and bytes/s out
- drops: dropped messages (by reason if possible)
- backpressure: slow-consumer events, queue/buffer saturation
- auth: invalid token rate, token expiry failures
- latency: publish-to-deliver latency (p50/p95/p99) if you can measure it
Do not depend on exact metric names unless your deployment documents them. The key is to have these signals.
Logs
Logs should allow you to correlate:
- connection lifecycle (connect/auth/subscribe/disconnect)
- publish requests and their outcomes
- drop / backpressure decisions
Security note: never log full AT_... tokens.
Tracing / correlation ids
For end-to-end debugging, pick an id and carry it everywhere:
- correlation id in payload (or in headers/metadata if your protocol has them)
- the server logs include that id on publish and deliver
- clients log the same id on send and on receive
If you control the payload schema, a simple field like trace_id is enough.
Troubleshooting checklist
When “subscribed but not receiving”, check in this order:
- Token permissions: tenant + channel root allowed? (see Permissions model)
- Tenant exact match: no wildcards for tenants (see Using tenants)
- Join window: did you publish too soon after subscribe? (see Delivery semantics)
- Channel name mismatch: exact spelling and separators
- Slow consumer: is the client reading fast enough?
- Disconnects: does the connection drop and silently reconnect?
For common “what does this error mean”, see Error codes.