Pillar I: AI Organization · § 12

KPIs and monitoring

We measure AI success on four mandatory KPIs and three operational metrics. Every one is computed continuously through the observability layer. Reported weekly to the CoE and quarterly to the Board. An unmeasured KPI is an uncommitted KPI.

Context

A KPI list exists to make trade-offs explicit. STP rate and Extraction Accuracy push us to ship more automation. Hallucination Rate and Cost per Transaction push us back when automation gets unsafe or uneconomic. The four together keep the system honest.

Four mandatory KPIs

>85% Straight-Through Processing rate Target: Platform-wide

>99% Extraction accuracy Target: Critical fields

<2% Hallucination rate Target: LLM-as-a-Judge sample

↓ QoQ Cost per transaction Target: Down every quarter

Straight-Through Processing (STP) rate. Share of documents AI processes end-to-end with no human touch. The leading indicator for business value. Computed on a rolling 30-day window.

Extraction accuracy. Measured per field. Critical fields (tax ID, total, partner code, invoice date) exceed 99%. Secondary fields exceed 95%.

Hallucination rate. How often AI generates content not supported by the source. Measured by sampling 1% of output through LLM-as-a-Judge plus a weekly human-graded ground-truth set.

Cost per transaction. API token cost plus infrastructure cost per processed document. This KPI trends down as models are optimized and volume scales.

Three operational KPIs

>99.5% AI feature uptime Target: Contractual SLA

PSI < 0.1 Data drift score Target: Against monthly baseline

<30s Median HITL approval latency Target: Accountant approval time

Reporting cadence

KPIs roll up at three levels:

Squad weekly review. Feature-level KPIs owned by the squad.
CoE monthly review. Cross-squad trends, platform-level KPIs.
Board quarterly review. Strategic KPIs and material movements.

Enterprise customers receive a quarterly transparency report with platform KPIs and an incident summary. See §13 Reporting.

Alert thresholds and escalation

KPI	Warning (yellow)	Critical (red)	Action
STP rate	< 80% over 7 days	< 75% over 7 days	Squad investigates. CoE escalation if red
Extraction accuracy	< 98% on critical fields	< 95%	Pause new releases. Trigger model retrain
Hallucination rate	> 3% sampled	> 5%	Raise sampling to 5%. Consider model rollback
Cost per transaction	Up > 20% QoQ	Up > 40% QoQ	Engineering review. Prompt and model optimization
Uptime	< 99% in 24h	< 95% in 1h	Trigger SEV2 or SEV1 incident
Data drift	PSI > 0.1	PSI > 0.25	Schedule retrain in current sprint
HITL latency	> 60s median	> 120s median	UX review. Likely a UI issue