Pillar III: Ethics, Transparency, Interpretability · § 03

Human-centricity. HITL and HOTL

Bizzi runs two oversight modes in parallel. Human-in-the-loop (HITL), where a person approves before the AI acts. Human-on-the-loop (HOTL), where the AI acts and a person supervises through a dashboard with rollback. Every AI feature lives in one mode or the other. There is no third mode called “fully autonomous”.

Context

The temptation to let the AI run on its own grows with confidence. Confidence is exactly what gets miscalibrated when stakes rise. We adopted the HITL and HOTL split because it forces a deliberate decision per feature. One feature earns autonomy. Another does not. The decision is reviewed when the feature ships, when its risk classification changes, and when the post-incident review flags it.

How HITL works

HITL triggers when any of the following is true for a transaction:

Value. Above the tenant-configured threshold (default 50 million VND).
Vendor novelty. Vendor appears for the first time in this tenant.
Confidence. A material field reports confidence below 80%.
Policy flag. Travel or procurement policy violation detected.
Statistical outlier. More than two standard deviations above the vendor’s tenant baseline.
Manual override. Customer has configured forced review for the category.

The pending item routes to an accountant. The UI shows extracted data, per-field confidence, evidence linked to the source document, and the AI’s reasoning trace with citations. The accountant edits, approves, or rejects. The action is written to the audit trail with approver identity and timestamp. The target median HITL approval latency is under 30 seconds. Anything higher signals a UX problem we investigate.

How HOTL works

HOTL is the default for standard e-invoices already verified through the tax authority API, vendors with a history of more than ten clean transactions, values below threshold, and high model confidence. The AI writes the transaction immediately. The action appears on the dashboard the same second. Every transaction has a per-record Rollback control. Batches have a single Rollback which reverses all related writes. A daily summary surfaces the actions the user should be aware of without forcing them to read every entry.

Choosing between the modes

For each new feature, the squad answers four questions during intake:

Will a wrong AI decision cause financial loss the customer is not able to easily reverse? If yes, HITL.
Does the feature touch a regulated decision (tax, payroll, settlement)? If yes, HITL.
Is the input distribution well-bounded and stable? If yes, HOTL is on the table.
Will the customer roll back a wrong batch in under five minutes? If no, HITL until the property holds.

A feature moves from HITL to HOTL only after a measured period at the target accuracy and only with the AI Governance Board approval.