Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Observability

LowLatencyPubSub is a real-time system. In production you need a way to answer:

  • “Are connections healthy?”
  • “Are we dropping messages?”
  • “Which tenant/channel is hot?”
  • “Why does a client disconnect?”

This page lists a practical baseline.

Track at least:

  • connections: current connections, connect rate, disconnect rate
  • subscriptions: subscribe/unsubscribe rate, active subscriptions
  • publish rate: messages/s and bytes/s in
  • delivery rate: messages/s and bytes/s out
  • drops: dropped messages (by reason if possible)
  • backpressure: slow-consumer events, queue/buffer saturation
  • auth: invalid token rate, token expiry failures
  • latency: publish-to-deliver latency (p50/p95/p99) if you can measure it

Do not depend on exact metric names unless your deployment documents them. The key is to have these signals.

Logs

Logs should allow you to correlate:

  • connection lifecycle (connect/auth/subscribe/disconnect)
  • publish requests and their outcomes
  • drop / backpressure decisions

Security note: never log full AT_... tokens.

Tracing / correlation ids

For end-to-end debugging, pick an id and carry it everywhere:

  • correlation id in payload (or in headers/metadata if your protocol has them)
  • the server logs include that id on publish and deliver
  • clients log the same id on send and on receive

If you control the payload schema, a simple field like trace_id is enough.

Troubleshooting checklist

When “subscribed but not receiving”, check in this order:

  1. Token permissions: tenant + channel root allowed? (see Permissions model)
  2. Tenant exact match: no wildcards for tenants (see Using tenants)
  3. Join window: did you publish too soon after subscribe? (see Delivery semantics)
  4. Channel name mismatch: exact spelling and separators
  5. Slow consumer: is the client reading fast enough?
  6. Disconnects: does the connection drop and silently reconnect?

For common “what does this error mean”, see Error codes.

Observability