Pillar IV: Data, AIOps, Infrastructure · § 10

ADLC Stage 5, Monitoring (Drift Detection)

A model accurate in March is quietly wrong by September. Distributions shift. Invoice formats change. Customer cohorts evolve. Terminology drifts. A model trained on the old distribution does worse and worse without any code change. Stage 5 is the discipline catching this before the customer does.

Context

In AP automation, drift is not occasional. It is continuous. The tax authority updates e-invoice formats. Customers migrate ERPs and start producing differently shaped invoices. New customer cohorts (more SMBs, more banks) shift the input distribution. A drift monitoring system running only when somebody remembers will detect the drift months after it started.

Three drift dimensions

Input drift

We compare the current distribution of input features against the training baseline.

PSI (Population Stability Index) is the standard metric. PSI < 0.1 is stable. 0.1 to 0.25 is a warning triggering investigation. > 0.25 is severe and triggers a retrain.
Per-feature PSI isolates which features moved.
Cohort drift tracks shifts in the tenant-category distribution.

Output drift

We compare the current distribution of model outputs (confidence scores, predicted categories) against the training baseline.

Confidence distribution drift. A sudden skew in confidence often indicates the model is wrong in a new way.
Output entropy. Rising entropy signals confusion not yet showing up in accuracy metrics.

Performance drift

Business KPIs (Pillar I §12) are leading indicators. By the time they move, drift has already affected customers.

STP rate down. Drift is affecting extraction accuracy.
Hallucination Rate up. Drift is affecting topic distribution.
Latency up. Drift is affecting input complexity.

Workflow when drift is detected

Alert. The system fires an alert to the squad Steward and the CoE.
Investigate. Root cause analysis. Is this a data shift or a model behavior shift?
Decide. Minor drift, monitor more closely. Significant drift, schedule a retrain in the next sprint. Severe drift, emergency retrain or rollback to the previous model version.
Retrain. Triggers Stage 1 of ADLC with refreshed training data.
Document. Drift event and resolution are written up in a PIR-style record.

Retrain cadence per model class

Different model classes drift at different rates and have different retrain costs.

OCR specialized models. Quarterly. Each quarter brings a new batch of invoice formats.
Classification models. Semi-annually, or sooner if PSI crosses 0.25.
Embedding models. Rarely retrained. These stay stable across time.
Commercial LLMs. Never retrained by us. The vendor manages the model. Instead we update the prompt and the RAG corpus.

Monitoring tooling

Drift detection runs as a scheduled job against the OLAP database.

Hourly. KPI scan.
Daily. PSI calculation for the key features.
Weekly. Comprehensive drift report.
Monthly. Fairness audit, drift broken down by cohort.

Alerts go to the squad’s Steward through Slack and email. The alert payload includes the metric, the threshold, the relevant cohort, and a link to the trace sample exemplifying the drift.

The loop back to Stage 1

Stage 5 is not an endpoint. It is the trigger closing ADLC into a loop.

Stage 5 detects drift → Stage 1 redesign → Stage 2 evaluate →
Stage 3 test → Stage 4 deploy → Stage 5 monitors again

The KPI we hold ourselves to. Every production AI model passes through at least one full ADLC cycle every six months, whether or not drift has been detected. Continuous improvement is a cadence, not a reaction.