Skip to content
Pillar I: AI Organization · § 12

KPIs and monitoring

We measure AI success on four mandatory KPIs and three operational metrics. Every one is computed continuously through the observability layer. Reported weekly to the CoE and quarterly to the Board. An unmeasured KPI is an uncommitted KPI.

A KPI list exists to make trade-offs explicit. STP rate and Extraction Accuracy push us to ship more automation. Hallucination Rate and Cost per Transaction push us back when automation gets unsafe or uneconomic. The four together keep the system honest.

>85% Straight-Through Processing rate Target: Platform-wide
>99% Extraction accuracy Target: Critical fields
<2% Hallucination rate Target: LLM-as-a-Judge sample
↓ QoQ Cost per transaction Target: Down every quarter

Straight-Through Processing (STP) rate. Share of documents AI processes end-to-end with no human touch. The leading indicator for business value. Computed on a rolling 30-day window.

Extraction accuracy. Measured per field. Critical fields (tax ID, total, partner code, invoice date) exceed 99%. Secondary fields exceed 95%.

Hallucination rate. How often AI generates content not supported by the source. Measured by sampling 1% of output through LLM-as-a-Judge plus a weekly human-graded ground-truth set.

Cost per transaction. API token cost plus infrastructure cost per processed document. This KPI trends down as models are optimized and volume scales.

>99.5% AI feature uptime Target: Contractual SLA
PSI < 0.1 Data drift score Target: Against monthly baseline
<30s Median HITL approval latency Target: Accountant approval time

KPIs roll up at three levels:

  • Squad weekly review. Feature-level KPIs owned by the squad.
  • CoE monthly review. Cross-squad trends, platform-level KPIs.
  • Board quarterly review. Strategic KPIs and material movements.

Enterprise customers receive a quarterly transparency report with platform KPIs and an incident summary. See §13 Reporting.

KPIWarning (yellow)Critical (red)Action
STP rate< 80% over 7 days< 75% over 7 daysSquad investigates. CoE escalation if red
Extraction accuracy< 98% on critical fields< 95%Pause new releases. Trigger model retrain
Hallucination rate> 3% sampled> 5%Raise sampling to 5%. Consider model rollback
Cost per transactionUp > 20% QoQUp > 40% QoQEngineering review. Prompt and model optimization
Uptime< 99% in 24h< 95% in 1hTrigger SEV2 or SEV1 incident
Data driftPSI > 0.1PSI > 0.25Schedule retrain in current sprint
HITL latency> 60s median> 120s medianUX review. Likely a UI issue