Agentic observability
A traditional service maps one request to one response. An agentic workflow fans out into nested LLM calls with shared state, where the answer the user sees comes through ten model invocations. Standard observability tooling does not represent this. Agentic observability is what we built, and bought, to make the runtime visible.
Context
Section titled “Context”We started with conventional APM and quickly found “function call” and “HTTP request” were not the right primitives. An agent’s reasoning is a tree, not a flat sequence, and the cost and latency of an agent invocation depend on choices inside the tree. Without a tree-structured trace, debugging a misbehaving AI assistant answer took hours of log archaeology. With it, the same investigation runs in minutes.
Trace structure
Section titled “Trace structure”A workflow forms one trace. Inside the trace are spans.
- Each LLM call is a span.
- Each tool call (database, external API) is a span.
- Each agent invocation creates a sub-trace.
Trace structure lets us navigate from “user asked a question” down to the specific LLM call misbehaving without crawling through unrelated log lines.
What we capture per span
Section titled “What we capture per span”- Input. The prompt sent (after PII redaction).
- Output. The model response.
- Metadata. Model version, temperature, top_p, and any other inference parameter.
- Latency. Start time, end time, duration.
- Token usage. Input tokens, output tokens, computed cost.
- Tags. User, tenant, workflow type, feature flag.
- Status. Success, error, timeout.
Prompt registry
Section titled “Prompt registry”The observability layer also acts as the prompt registry.
- Every prompt has a version number, never edited in place.
- A/B tests serve different versions to different cohorts.
- Production deployments reference a specific version, never “latest”.
This is what makes Stage 1 design (§6) auditable and Stage 4 rollback (§9) one click instead of a redeploy.
How we use it
Section titled “How we use it”Debug live issues
Section titled “Debug live issues”When a user reports “the AI assistant gave me the wrong answer,” DevSecOps will.
- Find the trace by user ID and timestamp.
- View the entire workflow. Planner, SQL, aggregator, guardrail.
- Identify which span misbehaved.
- Inspect prompts, model version, and intermediate outputs.
Time from user report to root cause. Minutes, not hours.
Performance optimization
Section titled “Performance optimization”Token-usage tracking exposes.
- Expensive workflows due for a redesign.
- Prompts longer than they need to be.
- Steps where a smaller model would do equally well.
Cost per Transaction is one of the KPIs Bizzi publishes. Bringing it down over time is the visible output of observability-driven optimization.
Compliance evidence
Section titled “Compliance evidence”Stored reasoning traces are evidence for.
- The audit trail (Pillar III §11).
- DSAR fulfillment. When a data subject asks what an AI decided about them, we have the trace.
- Customer Explanation Requests.
- Internal red team analysis.
Quality monitoring
Section titled “Quality monitoring”LLM-as-a-Judge runs on a 1% sample of production traffic in real time, scoring against Accuracy, Groundedness, and Safety. Aggregate scores track quality drift. Outlier traces are flagged for manual review. This closes the loop between Stage 2 evaluation (which scores prompt versions in CI) and Stage 5 monitoring (which scores live behavior).
Privacy
Section titled “Privacy”Reasoning traces contain sensitive content. We apply.
- PII redaction before logging. The trace stores the redacted prompt, not the raw one.
- Tenant isolation. Observability data is partitioned by tenant. There is no cross-tenant query path.
- Retention. Same as audit trail policies, with cold storage at seven years.
- Access control. Engineers access tenant traces only with case-by-case approval, logged.
Customer access
Section titled “Customer access”Enterprise customers request access to their tenant’s traces through the Explanation API (Pillar III §8), to aggregate metrics through the Customer Portal, and to audit exports for their own compliance team. The trace layer is not only internal. It is the substrate of explainability the customer pays for.