Pillar I: AI Organization · § 08

AI risk management, six steps

Every AI feature at Bizzi passes a six-step risk assessment before it reaches production. This is the operating spine of AI governance. The steps run in order. The artefacts they produce are durable. Nothing ships without all six.

1

Classify use case
2

Identify threats
3

Measure and score
4

Set mitigations
5

Red-team
6

Monitor in production

quarterly re-assessment

Six-step AI risk framework

Step 1. Classify the use case

Rate each use case Low / Medium / High.

Low. AI assists document search or simple categorization. No financial impact.
Medium. AI summarizes a contract or suggests an accounting code. A mistake annoys but is easy to reverse.
High. AI auto-reconciles and issues a payment instruction. A mistake creates direct financial loss.

The classification drives how strict the controls in the later steps need to be.

Step 2. Identify threats

Map both technical and non-technical threats:

Technical. Model drift, hallucination, prompt injection, data poisoning.
Non-technical. Reputation (a public AI failure), Legal (Decree 13/2023 violation), Compliance (a customer SLA breach).

Step 3. Measure and score

Use a 5×5 impact-by-likelihood matrix:

	Rare	Possible	Moderate	High	Certain
Catastrophic	M	H	H	E	E
Major	M	M	H	H	E
Moderate	L	M	M	H	H
Minor	L	L	M	M	H
Negligible	L	L	L	M	M

L = Low, M = Medium, H = High, E = Extreme. Any High use case scoring Extreme requires AI Board approval.

Step 4. Set mitigations

Apply guardrails by tier:

Low / Medium. Output guardrails, audit logging, confidence threshold in the UI.
High / Extreme. All of the above plus maker-checker (HITL), PII redaction, output sandboxing, strict rate limits, and a pre-wired kill-switch.

Step 5. Red-team

The internal security team attacks the model before it ships. Scope:

Prompt injection (direct and indirect via PDF).
Jailbreak attempts.
Sensitive-information extraction.
Excessive-agency tests.
Denial-of-wallet patterns.

Findings are graded by severity. Every High or Extreme finding closes before release.

Step 6. Monitor in production

Once live, track continuously:

Latency.
Cost per Transaction.
Error rate.
Drift (PSI).
Hallucination rate from LLM-as-a-Judge sampling.

When any metric breaches its warning band, the feature returns to Step 1 for re-classification. This re-assessment runs automatically every quarter for every production AI feature.