Governing an AML Transaction-Monitoring Alert-Triage Agent
13 min · Updated 2026-06-02
Answer
You govern an AML alert-triage agent by intercepting each of its consequential actions — auto-closing an alert, escalating it, drafting a SAR/STR narrative, or writing a disposition to the case system of record — with a policy checkpoint that runs before the action executes, routing the two outcomes that change a reporting obligation (auto-close and SAR-narrative) to a named L2/L3 human in a maker-checker gate, and sealing every disposition into independently verifiable lineage. The binding obligations come from AML law (FATF R.20, EU AMLR Art. 69/73, the US BSA SAR rules) and model-risk supervision (SR 11-7); the EU AI Act contributes human-oversight and record-keeping discipline rather than an automatic high-risk classification, because AML transaction monitoring is not enumerated in Annex III.
KLA is the independent, framework-neutral runtime governance and assurance layer for this workflow. KLA governs the agent you already built — in LangGraph, CrewAI, Agentforce, Microsoft Copilot, or in-house — it does not build, sell, or run the agent. The customer owns the agent; KLA owns the controls, the evidence, and the audit trail.
The workflow
The job & where the agent takes high-stakes action
A transaction-monitoring (TM) system generates alerts when transactions deviate from a customer's expected profile — the operational form of the FATF Recommendation 10(d) duty to scrutinize transactions across a relationship. An L1 analyst normally triages each alert: read it, gather context, and dispose of it. The alert-triage agent automates that L1 work and takes four consequential actions: (1) dismiss / auto-close an alert as no-further-action; (2) escalate the alert to an L2/L3 investigation; (3) draft or recommend a SAR/STR narrative; and (4) write the disposition plus its rationale to the case management system of record. Two of these actions silently move a legal reporting obligation: an auto-close can extinguish a reportable suspicion that FATF R.20 and EU AMLR Art. 69(1) require be reported promptly to the FIU, and the SAR-narrative action seeds the document a regulator will later read line by line. The other two — escalate and write-to-SoR — set the disposition rationale, start the BSA clock, and are bound by the tipping-off prohibition on what may be disclosed.
Stakes
Why it's high-stakes
An incorrect auto-close is a false negative that removes a transaction from human review entirely — no analyst ever sees it again — so a genuine suspicion that FATF R.20 and EU AMLR Art. 69(1) require be reported promptly is never filed. Under the US BSA, the clock is hard and numeric: a bank must file a SAR no later than 30 calendar days after initial detection of facts that may constitute a basis for filing, and in no case more than 60 days; an agent that auto-closes an alert can silently start (and blow) that clock. A SAR narrative the agent drafts is a legal document a regulator reads literally, and a mis-disposition erodes the very R.10(d) ongoing-monitoring control the institution is examined on. Because financial-crime detection is not enumerated in EU AI Act Annex III, the institution cannot rely on a CE-marked, conformity-assessed high-risk pipeline to backstop these failures — the governance burden sits squarely on the deployer's own AML and model-risk controls.
What goes wrong
Failure modes specific to this agent
Silent suspicion extinction at the auto-close boundary
The agent auto-closes a true-positive alert as no-further-action with a fluent, plausible rationale ("consistent with prior payroll pattern"), so the transaction is never escalated and no SAR/STR is ever filed. Unlike a missed escalation that a human queue would eventually surface, an auto-closed alert leaves the work queue entirely — there is no pending item, no aging case, nothing for a supervisor to notice. The reporting obligation under FATF R.20 / AMLR Art. 69(1) is extinguished without any human ever deciding it should be.
Why it's hard to catch: Ordinary testing measures agreement with historical analyst labels, but historical labels are themselves dominated by closures (industry alert false-positive rates run extremely high), so an agent that closes aggressively scores well on accuracy while systematically suppressing the rare true positive. The error is invisible in aggregate metrics, produces no exception or alert, and only surfaces years later in a regulator look-back — by which point the SAR deadlines are long blown. The harm is a non-event (a report that never happened), which no log of actions taken can reveal.
Disposition–rationale decoupling (narrative that doesn't match the call)
The agent writes a disposition (e.g. escalate) but generates a rationale that argues for the opposite, or attaches boilerplate reasoning that does not actually reference the alerting behavior. Because both fields are free text written by the same model in one pass, the disposition and its justification can drift apart while each reads as competent prose. Investigators downstream, and examiners later, rely on the rationale to understand why the call was made; a decoupled rationale corrupts the audit trail and the L2 investigator's starting point.
Why it's hard to catch: Each field passes its own sniff test — the disposition is a valid enum value, the rationale is grammatical and on-topic — so field-level validation and human spot-checks of either field in isolation pass. The defect is in the relationship between two fields, which unit tests and label-matching never assert. SR 11-7 calls this exactly what model risk is: adverse consequences from a model's output being used despite being incorrect, and it warns that such defects require objective, informed 'effective challenge' rather than the model's own self-report.
Tipping-off leakage through the agent's writes and logs
The agent writes a disposition narrative, a customer-facing case note, or a verbose trace that states or strongly implies that a SAR/STR is being or will be filed, or that an ML/TF analysis is underway — and that text lands somewhere a customer or an out-of-perimeter third party can see (a CRM note, a relationship-manager queue, an outbound message, an over-broad log sink). EU AMLR Art. 73 and the US BSA (31 U.S.C. § 5318(g)(2) / 31 CFR § 1020.320(e)) make this unlawful disclosure, and the prohibition explicitly extends to agents.
Why it's hard to catch: The agent's job is to write good narratives, so verbose, informative text is the success signal — the failure is a routing/confidentiality property of where that text goes, not a quality property of the text itself, which is exactly what content-quality evals reward. Standard testing checks that the agent produced a useful narrative; it does not assert that no confidentiality-tier-2 field ever crosses into a customer-readable channel. A single mis-bound tool or over-broad log exporter turns a perfect narrative into a tipping-off breach.
Stale-context disposition (acting on a snapshot the world has moved past)
The agent triages an alert using a context snapshot — sanctions/PEP status, prior SARs on the customer, related open cases, KYC refresh state — that was correct when fetched but is stale by the time the disposition is written, or that silently omits a related alert on the same customer. It then auto-closes or under-escalates because, on its partial view, the activity looks consistent with the profile. The R.10(d) duty is to assess consistency with the institution's knowledge of the customer; acting on a partial snapshot quietly defeats that duty.
Why it's hard to catch: Every individual disposition is internally coherent and defensible on the data the agent saw, so case-by-case review finds nothing wrong; the defect only appears when you correlate across the customer's full alert history and notice that linked activity was triaged in isolation. Test fixtures typically present one alert with complete, frozen context — the production failure mode is concurrent, fragmented, time-skewed context, which fixtures rarely reproduce.
How KLA governs it
Runtime controls, mapped to each decision point
KLA evaluates each consequential action with a policy gate that runs before the action executes — a Decision Request to POST /v1/decisions.evaluate — resolving to one of four outcomes in precedence order: allow → warn → require_approval → block (fail-closed by default). Every non-allow outcome carries reason codes and remediation.
| Decision point | Intercept (before action) | Policy checks → reason codes | Human routing (maker-checker) | Evidence captured |
|---|---|---|---|---|
| Auto-close / dismiss an alert as no-further-action | A KLA SDK checkpoint wraps the agent's close_alert tool call (Govern in Place); the checkpoint submits a Decision Request via POST /v1/decisions.evaluate with the alert's attributes before the close is written. Deployers running through the managed proxy gate the same step via the Executions API. |
| A require_approval outcome opens a Decision Desk Escalation routed by policy to a named L2 financial-crime investigator (maker-checker: the agent is the maker, the L2 reviewer is the checker). The reviewer sees the alert, the agent's proposed close + rationale, the triggering reason codes, and a link to the Lineage Record, then approves, denies, or re-routes to L3. |
|
| Draft / recommend a SAR/STR narrative | A KLA SDK checkpoint wraps the draft_sar_narrative tool call; the Decision Request submitted to POST /v1/decisions.evaluate carries the draft narrative and the case context before the draft is persisted or routed to filing. |
| require_approval opens a Decision Desk Escalation routed to a named SAR-filing officer / MLRO-delegate. The agent's role is fixed at draft/recommend; the human reviewer is the only party who can authorize the filing decision and start the formal clock. Decision Desk records who approved and when. |
|
| Escalate to L2/L3 investigation OR write disposition + rationale to the case system of record | A KLA SDK checkpoint wraps the write_disposition / escalate_case tool call; the Decision Request to POST /v1/decisions.evaluate carries both the disposition enum and the rationale text as paired attributes before the write commits to the SoR. |
| A tipping-off or rationale-mismatch block returns a structured denial to the agent (no SoR write) and surfaces to the owning financial-crime control team; a require_approval downgrade opens an Escalation to a named L2 reviewer. Routing rules are declared in policy so the Escalation lands in front of the team that owns that risk by default. |
|
| Cross-cutting: keep the governed run replayable and the evidence independently verifiable | Every checkpoint above runs through the same Evidence-by-Default pipeline: each Decision Request, policy decision, tool call, and human verdict is captured automatically as it happens (no separate logging step in the agent code). |
| n/a — this control is the evidence substrate the human verdicts above are recorded into. |
|
Least-privilege execution & data boundaries
- Auto-close / dismiss an alert as no-further-action: The close_alert tool is bound in the agent's immutable Release against the Tool Catalog; the agent cannot self-grant a higher-impact tool. Data Boundaries keep alert and customer data in the approved region/system so the snapshot the agent reads is the governed one.
- Draft / recommend a SAR/STR narrative: draft_sar_narrative is bound read/draft-only; the agent has no tool binding that can submit a filing to the FIU. The narrative-drafting context is held inside the Data Boundary so the draft never transits an unapproved system.
- Escalate to L2/L3 investigation OR write disposition + rationale to the case system of record: write_disposition is bound to the governed case-SoR endpoint only; the agent has no binding to customer-facing CRM, messaging, or relationship-manager queues. Data Boundaries plus the Tool Catalog binding are what mechanically prevent the tipping-off failure mode — the agent physically cannot write to a customer-readable surface.
- Cross-cutting: keep the governed run replayable and the evidence independently verifiable: the agent runs under a single immutable Release; any change to model, instructions, parameters, or tool bindings produces a new hashed Release, so 'what was running on the date of this disposition' is a provable question.
Mapped to regulation
Regulatory mapping
| Framework | Article / section | Obligation (plain language) | How a KLA runtime control satisfies it | Source |
|---|---|---|---|---|
| FATF Recommendations | Recommendation 20 — Reporting of suspicious transactions | If an institution suspects or has reasonable grounds to suspect funds are proceeds of crime or relate to terrorist financing, it must by law report promptly to the FIU. An alert-triage agent influences this trigger every time it dismisses, escalates, or recommends a filing. | The auto-close checkpoint (runtime_controls[0]) prevents the agent from silently extinguishing a reportable suspicion: elevated-risk and linked-case closes are blocked or routed to a named L2 human before the alert leaves the queue, so the decision to not report is always made (or ratified) by a person. | Source |
| FATF Recommendations | Recommendation 10(d) — Ongoing CDD: scrutiny of transactions | Institutions must conduct ongoing scrutiny of transactions across a relationship to ensure they are consistent with the institution's knowledge of the customer, business, risk profile and source of funds. Transaction monitoring operationalizes this duty. | The linked-case block in the auto-close checkpoint and the disposition–rationale consistency block (runtime_controls[0], runtime_controls[2]) stop the agent from disposing of an alert on a partial, isolated snapshot — preserving the 'consistent with the institution's knowledge of the customer' test against the stale-context failure mode. | Source |
| EU AMLR — Regulation (EU) 2024/1624 | Article 69(1) — Reporting of suspicions | Obliged entities must promptly report to the FIU, on their own initiative, where they know/suspect/have reasonable grounds to suspect that funds or activities (regardless of amount) are proceeds of crime or related to terrorist financing. All suspicious transactions — including attempted ones and suspicions from inability to complete CDD — must be reported. | The auto-close require_approval/block rules (runtime_controls[0]) ensure no in-scope suspicion is closed without human sign-off, and the SAR-narrative human-signoff rule (runtime_controls[1]) keeps the report-or-not decision a human one — so the Art. 69(1) 'on their own initiative' duty rests with an accountable person, not the agent. | Source |
| EU AMLR — Regulation (EU) 2024/1624 | Article 73 — Prohibition of disclosure (tipping-off) | Obliged entities and their staff — explicitly including agents — must not disclose to the customer or third parties that activity is being assessed under Art. 69, that information has/will be transmitted to the FIU, or that an ML/TF analysis is underway. | The tipping-off block plus least-privilege tool binding (runtime_controls[2]) mechanically prevent the agent from writing confidentiality-tier-2 content into any customer-readable or out-of-perimeter destination: the write is blocked, and the agent has no Tool Catalog binding to a customer-facing surface in the first place. | Source |
| US BSA / FinCEN — 31 CFR Chapter X | 31 CFR § 1020.320(b)(3) — SAR filing deadline | A bank must file a SAR no later than 30 calendar days after initial detection of facts that may constitute a basis for filing; the deadline may extend to identify a suspect but in no case beyond 60 calendar days after initial detection. | Capturing the auto-close decision with a sealed timestamp on the Lineage Record (runtime_controls[0], runtime_controls[3]) makes 'date of initial detection' and the disposition that started/closed the clock a provable, queryable record — so the institution can demonstrate the 30/60-day clock was respected rather than silently blown by an agent close. | Source |
| US BSA / FinCEN — 31 CFR Chapter X & 31 U.S.C. § 5318 | 31 CFR § 1020.320(e) and 31 U.S.C. § 5318(g)(2)(A)(i) — SAR confidentiality / notification prohibited | No bank or its agent may disclose a SAR or any information that would reveal its existence, and an institution (including its agents and contractors) may not notify any person involved in a transaction that it has been reported. | The same tipping-off block and least-privilege binding (runtime_controls[2]) enforce the US confidentiality bar: the agent — an 'agent' for the purposes of § 5318(g)(2) — is blocked from emitting SAR-revealing content to any unapproved destination, and its writes are confined by Data Boundaries to the governed case SoR. | Source |
| Fed/OCC SR 11-7 (interagency model-risk guidance) | SR 11-7 / OCC 2011-12 — model definition, model risk, 'effective challenge' | A model that processes inputs into estimates invariably creates model risk; the guiding control is 'effective challenge' — critical analysis by objective, informed parties who can identify limitations and produce changes. An LLM alert-triage agent is itself a model that must be validated, challenged, and monitored. | The maker-checker Decision Desk Escalation (runtime_controls[0], runtime_controls[1]) institutionalizes 'effective challenge' on the highest-stakes dispositions — a named, competent human independently reviews the agent's call — and the disposition–rationale consistency block (runtime_controls[2]) catches the model's incorrect-but-fluent output that self-report would never reveal. | Source |
| EU AI Act — Regulation (EU) 2024/1689 | Annex III — high-risk classification scope (and the credit-scoring fraud-detection exception) | Annex III enumerates eight high-risk areas; the financial-services entry covers creditworthiness/credit-scoring 'with the exception of AI systems used for the purpose of detecting financial fraud.' AML transaction monitoring / financial-crime detection is not enumerated anywhere in Annex III, so it is not automatically a high-risk AI system. | This is a scoping mapping, not a control mapping: it tells the deployer that the high-risk conformity regime is not the load-bearing obligation here. The runtime controls are therefore designed to satisfy the AML and model-risk regimes first, while voluntarily adopting the EU AI Act's human-oversight and logging disciplines (below) as good practice and against the case where a deployer is otherwise in scope. | Source |
| EU AI Act — Regulation (EU) 2024/1689 | Article 14(4)(b),(d),(e) — Human oversight | Oversight persons must stay aware of automation bias (over-reliance on the system's output), be able to decide not to use / disregard / override / reverse the output, and be able to interrupt the system via a 'stop' to a safe state. | require_approval pause + Decision Desk override and re-route (runtime_controls[0], runtime_controls[1]) are the direct mechanical analog of disregard/override/reverse; block (runtime_controls[2]) is the 'stop to a safe state' — the agent's action is halted with a structured denial. The reviewer sees reason codes precisely to counter automation bias rather than rubber-stamping the agent. | Source |
| EU AI Act — Regulation (EU) 2024/1689 | Article 26(2) & 26(6) — Deployer obligations | Deployers must assign human oversight to natural persons with the necessary competence, training and authority, and keep auto-generated logs under their control for at least six months. | Decision Desk routes Escalations to named, competent L2/L3/MLRO-delegate reviewers (runtime_controls[0–2]) — satisfying the competence-and-authority requirement — and the Evidence-by-Default pipeline retains automatically generated Lineage Records well beyond the six-month floor (runtime_controls[3]). | Source |
| EU AI Act — Regulation (EU) 2024/1689 | Article 12(1) — Record-keeping (automatic logging) | High-risk AI systems must technically allow automatic recording of events (logs) over the system lifetime to enable traceability appropriate to the intended purpose. | Evidence-by-Default captures every Decision Request, policy decision, tool call, and human verdict automatically (runtime_controls[3]) and seals each into an append-only, Merkle-proofed Lineage Record — satisfying the automatic-logging and traceability requirement as a built-in property rather than a bolt-on, even though Art. 12 binds strictly only where the system is in high-risk scope. | Source |
Prove the control held
Audit-evidence checklist
- For every auto-close: the Decision Request, the policy outcome + reasonCodes, and — where elevated-risk — the named L2 approver's verdict and timestamp, sealed on the Lineage Record (proves no in-scope suspicion was closed without human sign-off; FATF R.20 / AMLR Art. 69).
- For every SAR/STR narrative: the exact agent draft text, the require_approval record, and the named SAR-filing-officer's authorization — establishing human authorship of the filing decision and that the agent only drafted (SR 11-7 effective challenge; AMLR Art. 69(1) 'on their own initiative').
- For every disposition write: the paired disposition + rationale as submitted, plus the destination tool and its Tool Catalog binding proving the write reached only the approved case SoR and never a customer-readable surface (tipping-off: AMLR Art. 73 / 31 CFR § 1020.320(e) / 31 U.S.C. § 5318(g)(2)).
- A sealed initial-detection timestamp on each alert disposition so the 30/60-day BSA SAR clock is provable rather than reconstructed (31 CFR § 1020.320(b)(3)).
- The active Release hash (model + instructions + parameters + tool bindings) stamped on each Lineage Record, answering 'exactly what configuration produced this disposition.'
- Independent verification: each Lineage Record verifiable via GET /v1/lineage/{id}/verify by recomputing the Merkle root against the published ImmuDB ledger root — no trust in KLA required.
- Retention of automatically generated logs for at least the six-month minimum (EU AI Act Art. 26(6)), exportable as a Sealed Evidence Bundle or an EU AI Act Annex IV Control Pack for an examiner.
- A periodic 'effective challenge' pack: a sample of auto-closed alerts re-reviewed by an independent party, with their verdicts captured — evidence the model-risk monitoring required by SR 11-7 actually ran.
A concrete intercept
Reference scenario: An agent tries to auto-close an alert on a PEP with a fluent rationale — and a named L2 investigator gets the veto
- 1
The alert-triage agent reads alert ALRT-77214 (rapid round-number transfers into and out of a business account) and decides to auto-close it as no-further-action, generating the rationale 'pattern consistent with prior supplier payments.'
- 2
Before the close is written, the KLA SDK checkpoint wrapping close_alert submits a Decision Request to POST /v1/decisions.evaluate, carrying attributes: customer_pep=true, jurisdiction_risk=high, linked_open_alerts=1.
- 3
Policy matches two rules: AML_AUTOCLOSE_ELEVATED_RISK (PEP + high-risk jurisdiction) returns require_approval, and AML_AUTOCLOSE_LINKED_CASE_OPEN (a related open alert exists) returns block. By precedence the single block wins — the close does not proceed; the agent receives a structured denial with reason codes and remediation.
- 4
Because a related case is open, policy also routes a Decision Desk Escalation to the named L2 financial-crime investigator who owns that customer's case. The reviewer sees the alert, the agent's proposed close and its rationale, both reason codes, and a link to the Lineage Record.
- 5
The investigator disregards the agent's close (the Art. 14(4)(d) override), links the two alerts, and escalates to L3 — the maker-checker control and SR 11-7 'effective challenge' in action.
- 6
Every step — the Decision Request, the block + require_approval outcomes, the reason codes, the reviewer's identity and verdict, and the active Release hash — is sealed into an append-only Lineage Record with a Merkle proof, verifiable later via GET /v1/lineage/{id}/verify and exportable into an EU AI Act Annex IV Control Pack with no trust in KLA required.
What most teams get wrong
The non-obvious insight
An AML transaction-monitoring agent is almost certainly NOT a 'high-risk AI system' under EU AI Act Annex III — and that is precisely why it needs more deliberate governance, not less. Annex III's financial-services entry covers credit-scoring but carves out 'AI systems used for the purpose of detecting financial fraud,' and financial-crime detection appears nowhere else in the list. So the deployer gets no CE mark, no provider conformity assessment, no Annex IV technical file delivered to them — none of the high-risk scaffolding backstops these decisions. The binding force comes instead from AML law (FATF R.20, AMLR Art. 69/73, the BSA SAR rules) and from SR 11-7 model-risk supervision, where the agent is unambiguously a 'model' subject to validation and 'effective challenge.'
Why it matters: Teams routinely reason backwards: 'the EU AI Act is the strict regime, so if our agent isn't high-risk we can govern it lightly.' For AML triage that inference is exactly inverted. The absence of an Annex III conformity wrapper means the deployer's own runtime controls — human sign-off on the report/no-report decision, tipping-off containment, sealed evidence of the clock — are the only thing standing between an auto-closed alert and a years-later regulatory look-back. The EU AI Act here is best used voluntarily as a human-oversight and logging discipline (Art. 12/14/26), while the load-bearing obligations are AML and model-risk. Mis-classifying the regime leads straight to under-controlling the one decision — the silent auto-close — that has no human in the loop and no exception to catch it.
The US BSA gives a bank a hard ceiling of 60 calendar days from initial detection to file a SAR (30 days, extendable by 30 to identify a suspect) — which means an alert-triage agent that auto-closes a true-positive alert does not merely make an error, it silently starts and then blows a statutory clock that no one is watching, because a closed alert leaves the work queue and generates no aging exception. (source)
Q&A
Frequently asked questions
Is an AML transaction-monitoring agent a high-risk AI system under the EU AI Act?
Most likely not. EU AI Act Annex III enumerates eight high-risk areas; its financial-services entry covers creditworthiness and credit-scoring but expressly excludes 'AI systems used for the purpose of detecting financial fraud,' and AML/financial-crime detection appears nowhere else in Annex III. So an AML triage agent is generally not automatically high-risk. The governance obligations flow primarily from AML law (FATF R.20, EU AMLR Art. 69/73, the US BSA SAR rules) and model-risk supervision (SR 11-7); the EU AI Act applies as human-oversight and record-keeping good practice (Art. 12/14/26) and where the deployer is otherwise in scope. Confirm classification against your own deployment and seek counsel — do not assume 'not high-risk' means 'low-governance.'
Which agent decisions must a human sign off on, and which can run automatically?
The two decisions that move a legal reporting obligation get a human gate. A SAR/STR narrative is always require_approval — the agent drafts, a named SAR-filing officer authorizes the filing decision. An auto-close is gated to require_approval (or blocked) whenever the alert carries elevated-risk attributes (PEP, high-risk jurisdiction, prior SAR) or a related case is open, routing to a named L2 investigator. Routine low-risk closes and ordinary disposition writes can proceed as allow or warn, but every one still records a reason code on its Lineage Record so the decision is reconstructable.
How does governing the agent prevent a tipping-off breach?
Two layers. First, a content check blocks any disposition narrative or case note that states or implies a SAR is being filed (or that an ML/TF analysis is underway) when its destination is a customer-readable or out-of-perimeter field (reasonCode AML_TIPPING_OFF_RISK). Second — and more fundamentally — least-privilege Tool Catalog binding plus Data Boundaries mean the agent has no tool that can write to a customer-facing CRM, message, or relationship-manager queue at all. EU AMLR Art. 73 and the US BSA (31 U.S.C. § 5318(g)(2), 31 CFR § 1020.320(e)) extend the tipping-off bar explicitly to agents, so confining the agent's writes to the governed case SoR is the controlling mechanism.
How do you prove to an examiner that the agent didn't silently miss a filing?
Every disposition — including every auto-close — is captured automatically as an append-only Lineage Record carrying the Decision Request, the policy outcome and reason codes, any human verdict, the active Release hash, and a sealed initial-detection timestamp. Because the records are anchored to a Merkle-proofed ImmuDB ledger, an examiner can verify them via GET /v1/lineage/{id}/verify without trusting KLA, and you can export the relevant slice as a Sealed Evidence Bundle or an EU AI Act Annex IV Control Pack. The closed alerts are evidence too, not a blind spot: you can demonstrate which were closed by policy-as-allowed and which were ratified by a named human.
The agent's accuracy against historical analyst labels is high — isn't that enough validation?
No, and relying on it is the classic trap. Historical labels are dominated by closures because alert false-positive rates are very high, so an agent that closes aggressively scores well on label-agreement while systematically suppressing the rare true positive — the exact case that must be reported. SR 11-7 calls this out: model risk is adverse consequences from incorrect-but-used output, and the prescribed control is 'effective challenge' by objective, informed parties, not the model's self-reported accuracy. Governance adds that challenge structurally: maker-checker review on high-stakes closes and a periodic independent re-review of auto-closed alerts, both captured as evidence the monitoring actually ran.
Does KLA build or run the AML agent?
No. The customer builds and owns the AML triage agent (LangGraph, CrewAI, Agentforce, Microsoft Copilot, or in-house). KLA is the independent runtime governance and assurance layer that governs the agent in place: it intercepts each consequential action before it executes, enforces policy-as-code with the four outcomes (allow / warn / require_approval / block), routes high-stakes decisions to named human approvers in Decision Desk, and seals signed execution lineage mapped to regulation. KLA never files a SAR, never closes an alert, and never makes the call — humans hold the veto on require_approval, and policy holds authority over whether an action runs.
Related blueprints & guides
- Governing a Sanctions-Screening Hit-Adjudication Agent
- Governing an FNOL / Claims-Intake Triage Agent (NAIC AI Bulletin + EU AI Act)
- Governing a Pharmacovigilance Adverse-Event Intake & Case-Processing Agent
- Financial-crime governed-workflow blueprints (hub)
- Governing an AML sanctions-screening hit-adjudication agent
- Policy-Gated Execution (core concept)
- Add a Human Approval Gate (maker-checker)
- Decision Desk (Escalations & approver routing)
- Evidence Room (Sealed Evidence Bundle / Control Pack)
Primary sources
- FATF Recommendations — Recommendation 20: Reporting of suspicious transactions — FATF (via ICNL library mirror of the FATF Recommendations)
- Regulation (EU) 2024/1624 (AMLR) — Article 69: Reporting of suspicions — EUR-Lex (Official Journal text, via amlr.eu consolidated reproduction)
- Regulation (EU) 2024/1624 (AMLR) — Article 73: Prohibition of disclosure (tipping-off) — EUR-Lex (Official Journal text, via amlr.eu consolidated reproduction)
- 31 CFR § 1020.320(b)(3) — Reports by banks of suspicious transactions: SAR filing deadline — US e-CFR / Treasury–FinCEN (via Cornell Legal Information Institute)
- 31 U.S.C. § 5318(g)(2)(A) — Notification prohibited (statutory tipping-off bar) — US House Office of the Law Revision Counsel (US Code, via govinfo.gov)
- SR 11-7 / OCC 2011-12 — Supervisory Guidance on Model Risk Management (definition of model, model risk, effective challenge) — Board of Governors of the Federal Reserve & OCC (interagency guidance, reissued by FDIC as FIL-22-2017)
- Regulation (EU) 2024/1689 (EU AI Act) — Article 14: Human oversight — artificialintelligenceact.eu (mirror of Regulation (EU) 2024/1689)
- Regulation (EU) 2024/1689 (EU AI Act) — Article 26: Obligations of deployers of high-risk AI systems — artificialintelligenceact.eu (mirror of Regulation (EU) 2024/1689)
- Regulation (EU) 2024/1689 (EU AI Act) — Article 12: Record-keeping (automatic logging) — artificialintelligenceact.eu (mirror of Regulation (EU) 2024/1689)
- Regulation (EU) 2024/1689 (EU AI Act) — Annex III: High-risk AI systems enumeration (and credit-scoring fraud-detection exception) — artificialintelligenceact.eu (mirror of Regulation (EU) 2024/1689)
- KLA Control Plane Docs — Policy-Gated Execution — KLA Digital
- KLA Control Plane Docs — Decision Desk — KLA Digital
- KLA Control Plane Docs — Evidence-by-Default — KLA Digital
- KLA Control Plane Docs — Evidence Room (Sealed Evidence Bundle / Control Pack) — KLA Digital
- KLA Control Plane Docs — Agents & Registry (Releases, Tool Catalog, least-privilege) — KLA Digital
- KLA Control Plane Docs — Add a Human Approval Gate (SDK checkpoint / maker-checker) — KLA Digital
- KLA Control Plane Docs — Govern an Agent End-to-End — KLA Digital
- KLA Control Plane Docs — API Reference (decisions.evaluate, lineage verify) — KLA Digital
Govern this workflow without re-platforming the agent
KLA wraps the agent you already run, gates each high-stakes action, routes the hard calls to a named human, and seals independently verifiable evidence mapped to regulation.
