Financial Crime

transaction monitoring

Governing an AML Transaction-Monitoring Alert-Triage Agent

13 min · Updated 2026-06-02

Answer

You govern an AML alert-triage agent by intercepting each of its consequential actions — auto-closing an alert, escalating it, drafting a SAR/STR narrative, or writing a disposition to the case system of record — with a policy checkpoint that runs before the action executes, routing the two outcomes that change a reporting obligation (auto-close and SAR-narrative) to a named L2/L3 human in a maker-checker gate, and sealing every disposition into independently verifiable lineage. The binding obligations come from AML law (FATF R.20, EU AMLR Art. 69/73, the US BSA SAR rules) and model-risk supervision (SR 11-7); the EU AI Act contributes human-oversight and record-keeping discipline rather than an automatic high-risk classification, because AML transaction monitoring is not enumerated in Annex III.

KLA is the independent, framework-neutral runtime governance and assurance layer forthis Process. KLA governs the agent you already built : in LangGraph, CrewAI, Agentforce, Microsoft Copilot, or in-house: it does not build, sell, or run the agent. The customer owns the agent; KLA owns the controls, the evidence, and the audit trail.

The Process

The job & where the agent takes high-stakes action

A transaction-monitoring (TM) system generates alerts when transactions deviate from a customer's expected profile — the operational form of the FATF Recommendation 10(d) duty to scrutinize transactions across a relationship. An L1 analyst normally triages each alert: read it, gather context, and dispose of it. The alert-triage agent automates that L1 work and takes four consequential actions: (1) dismiss / auto-close an alert as no-further-action; (2) escalate the alert to an L2/L3 investigation; (3) draft or recommend a SAR/STR narrative; and (4) write the disposition plus its rationale to the case management system of record. Two of these actions silently move a legal reporting obligation: an auto-close can extinguish a reportable suspicion that FATF R.20 and EU AMLR Art. 69(1) require be reported promptly to the FIU, and the SAR-narrative action seeds the document a regulator will later read line by line. The other two — escalate and write-to-SoR — set the disposition rationale, start the BSA clock, and are bound by the tipping-off prohibition on what may be disclosed.

Stakes

Why it's high-stakes

An incorrect auto-close is a false negative that removes a transaction from human review entirely — no analyst ever sees it again — so a genuine suspicion that FATF R.20 and EU AMLR Art. 69(1) require be reported promptly is never filed. Under the US BSA, the clock is hard and numeric: a bank must file a SAR no later than 30 calendar days after initial detection of facts that may constitute a basis for filing, and in no case more than 60 days; an agent that auto-closes an alert can silently start (and blow) that clock. A SAR narrative the agent drafts is a legal document a regulator reads literally, and a mis-disposition erodes the very R.10(d) ongoing-monitoring control the institution is examined on. Because financial-crime detection is not enumerated in EU AI Act Annex III, the institution cannot rely on a CE-marked, conformity-assessed high-risk pipeline to backstop these failures — the governance burden sits squarely on the deployer's own AML and model-risk controls.

What goes wrong

Failure modes specific to this agent

Silent suspicion extinction at the auto-close boundary

The agent auto-closes a true-positive alert as no-further-action with a fluent, plausible rationale ("consistent with prior payroll pattern"), so the transaction is never escalated and no SAR/STR is ever filed. Unlike a missed escalation that a human queue would eventually surface, an auto-closed alert leaves the work queue entirely — there is no pending item, no aging case, nothing for a supervisor to notice. The reporting obligation under FATF R.20 / AMLR Art. 69(1) is extinguished without any human ever deciding it should be.

Why it's hard to catch: Ordinary testing measures agreement with historical analyst labels, but historical labels are themselves dominated by closures (industry alert false-positive rates run extremely high), so an agent that closes aggressively scores well on accuracy while systematically suppressing the rare true positive. The error is invisible in aggregate metrics, produces no exception or alert, and only surfaces years later in a regulator look-back — by which point the SAR deadlines are long blown. The harm is a non-event (a report that never happened), which no log of actions taken can reveal.

Disposition–rationale decoupling (narrative that doesn't match the call)

The agent writes a disposition (e.g. escalate) but generates a rationale that argues for the opposite, or attaches boilerplate reasoning that does not actually reference the alerting behavior. Because both fields are free text written by the same model in one pass, the disposition and its justification can drift apart while each reads as competent prose. Investigators downstream, and examiners later, rely on the rationale to understand why the call was made; a decoupled rationale corrupts the audit trail and the L2 investigator's starting point.

Why it's hard to catch: Each field passes its own sniff test — the disposition is a valid enum value, the rationale is grammatical and on-topic — so field-level validation and human spot-checks of either field in isolation pass. The defect is in the relationship between two fields, which unit tests and label-matching never assert. SR 11-7 calls this exactly what model risk is: adverse consequences from a model's output being used despite being incorrect, and it warns that such defects require objective, informed 'effective challenge' rather than the model's own self-report.

Tipping-off leakage through the agent's writes and logs

The agent writes a disposition narrative, a customer-facing case note, or a verbose trace that states or strongly implies that a SAR/STR is being or will be filed, or that an ML/TF analysis is underway — and that text lands somewhere a customer or an out-of-perimeter third party can see (a CRM note, a relationship-manager queue, an outbound message, an over-broad log sink). EU AMLR Art. 73 and the US BSA (31 U.S.C. § 5318(g)(2) / 31 CFR § 1020.320(e)) make this unlawful disclosure, and the prohibition explicitly extends to agents.

Why it's hard to catch: The agent's job is to write good narratives, so verbose, informative text is the success signal — the failure is a routing/confidentiality property of where that text goes, not a quality property of the text itself, which is exactly what content-quality evals reward. Standard testing checks that the agent produced a useful narrative; it does not assert that no confidentiality-tier-2 field ever crosses into a customer-readable channel. A single mis-bound tool or over-broad log exporter turns a perfect narrative into a tipping-off breach.

Stale-context disposition (acting on a snapshot the world has moved past)

The agent triages an alert using a context snapshot — sanctions/PEP status, prior SARs on the customer, related open cases, KYC refresh state — that was correct when fetched but is stale by the time the disposition is written, or that silently omits a related alert on the same customer. It then auto-closes or under-escalates because, on its partial view, the activity looks consistent with the profile. The R.10(d) duty is to assess consistency with the institution's knowledge of the customer; acting on a partial snapshot quietly defeats that duty.

Why it's hard to catch: Every individual disposition is internally coherent and defensible on the data the agent saw, so case-by-case review finds nothing wrong; the defect only appears when you correlate across the customer's full alert history and notice that linked activity was triaged in isolation. Test fixtures typically present one alert with complete, frozen context — the production failure mode is concurrent, fragmented, time-skewed context, which fixtures rarely reproduce.

How KLA governs it

Runtime controls, mapped to each decision point

KLA evaluates each consequential action with a policy gate that runs before the action executes: a Decision Request to POST /v1/decisions.evaluate: resolving to one of four outcomes in precedence order: allow → warn → require_approval → block (fail-closed by default). Every non-allow outcome carries reason codes and remediation.

Decision point	Intercept (before action)	Policy checks → reason codes	Human routing (maker-checker)	Evidence captured
Auto-close / dismiss an alert as no-further-action	A KLA SDK checkpoint wraps the agent's close_alert tool call (Govern in Place); the checkpoint submits a Decision Request via POST /v1/decisions.evaluate with the alert's attributes before the close is written. Deployers running through the managed proxy gate the same step via the Executions API.	require_approval when the alert scenario or customer carries elevated-risk attributes (PEP, high-risk jurisdiction, prior SAR on the customer, structuring/rapid-movement typology) — reasonCode AML_AUTOCLOSE_ELEVATED_RISK require_approval when the agent's confidence / rationale-completeness falls below the configured floor — reasonCode AML_AUTOCLOSE_LOW_CONFIDENCE block when a related open alert or open case exists on the same customer (prevents isolated-snapshot closure) — reasonCode AML_AUTOCLOSE_LINKED_CASE_OPEN warn on routine low-risk closes so the advisory and reason code are still recorded on the Lineage Record	A require_approval outcome opens a Decision Desk Escalation routed by policy to a named L2 financial-crime investigator (maker-checker: the agent is the maker, the L2 reviewer is the checker). The reviewer sees the alert, the agent's proposed close + rationale, the triggering reason codes, and a link to the Lineage Record, then approves, denies, or re-routes to L3.	Decision Request (action=close_alert + attributes) policy outcome + reasonCodes + remediation Escalation id and the named reviewer's approve/deny verdict the active Release hash that produced the disposition append-only Lineage Record with Merkle proof
Draft / recommend a SAR/STR narrative	A KLA SDK checkpoint wraps the draft_sar_narrative tool call; the Decision Request submitted to POST /v1/decisions.evaluate carries the draft narrative and the case context before the draft is persisted or routed to filing.	require_approval on every SAR/STR narrative the agent produces — the agent never files; a human always signs off — reasonCode AML_SAR_NARRATIVE_HUMAN_SIGNOFF warn + reason code when the narrative omits required elements (the 'who/what/when/where/why' of the suspicious activity) so the reviewer is pointed at the gap — reasonCode AML_SAR_NARRATIVE_INCOMPLETE block if the draft is routed to any destination outside the approved filing path (anti-misroute / anti-tipping-off) — reasonCode AML_SAR_DESTINATION_UNAPPROVED	require_approval opens a Decision Desk Escalation routed to a named SAR-filing officer / MLRO-delegate. The agent's role is fixed at draft/recommend; the human reviewer is the only party who can authorize the filing decision and start the formal clock. Decision Desk records who approved and when.	the exact draft narrative text the agent produced require_approval outcome + reason codes named approver verdict and timestamp (the human authorship of the filing decision) Release hash + model/instructions snapshot that generated the draft sealed Lineage Record
Escalate to L2/L3 investigation OR write disposition + rationale to the case system of record	A KLA SDK checkpoint wraps the write_disposition / escalate_case tool call; the Decision Request to POST /v1/decisions.evaluate carries both the disposition enum and the rationale text as paired attributes before the write commits to the SoR.	block when the rationale text is empty, boilerplate, or fails the disposition-consistency check (catches disposition–rationale decoupling) — reasonCode AML_DISPOSITION_RATIONALE_MISMATCH block when the disposition narrative or case note contains confidentiality-tier-2 content (states/implies a SAR is being filed or an ML/TF analysis is underway) and the destination is a customer-readable or out-of-perimeter field — reasonCode AML_TIPPING_OFF_RISK require_approval when an escalate is downgraded to close on re-triage — reasonCode AML_DISPOSITION_DOWNGRADE warn and record on every routine write so each disposition carries a reason code on its Lineage Record	A tipping-off or rationale-mismatch block returns a structured denial to the agent (no SoR write) and surfaces to the owning financial-crime control team; a require_approval downgrade opens an Escalation to a named L2 reviewer. Routing rules are declared in policy so the Escalation lands in front of the team that owns that risk by default.	paired disposition + rationale as submitted block/approval outcome + reason codes (incl. any tipping-off block) destination tool + its Tool Catalog binding (proof the write went only to the approved SoR) Lineage Record linking prompt → tool call → decision → human verdict Merkle proof anchoring the record to the ImmuDB ledger root
Cross-cutting: keep the governed run replayable and the evidence independently verifiable	Every checkpoint above runs through the same Evidence-by-Default pipeline: each Decision Request, policy decision, tool call, and human verdict is captured automatically as it happens (no separate logging step in the agent code).	fail-closed default: if the policy engine cannot evaluate a gated action, the action does not proceed every non-allow outcome must carry reasonCodes + remediation (enforced at policy lint / publish time)	n/a — this control is the evidence substrate the human verdicts above are recorded into.	automatic event log over the agent's lifetime (no manual instrumentation) Lineage Record per disposition, verifiable via GET /v1/lineage/{id}/verify (recompute Merkle root, no trust in KLA required) exportable Sealed Evidence Bundle and an EU AI Act Annex IV Control Pack retention of automatically generated logs for at least the six-month minimum

Least-privilege execution & data boundaries

Auto-close / dismiss an alert as no-further-action: The close_alert tool is bound in the agent's immutable Release against the Tool Catalog; the agent cannot self-grant a higher-impact tool. Data Boundaries keep alert and customer data in the approved region/system so the snapshot the agent reads is the governed one.
Draft / recommend a SAR/STR narrative: draft_sar_narrative is bound read/draft-only; the agent has no tool binding that can submit a filing to the FIU. The narrative-drafting context is held inside the Data Boundary so the draft never transits an unapproved system.
Escalate to L2/L3 investigation OR write disposition + rationale to the case system of record: write_disposition is bound to the governed case-SoR endpoint only; the agent has no binding to customer-facing CRM, messaging, or relationship-manager queues. Data Boundaries plus the Tool Catalog binding are what mechanically prevent the tipping-off failure mode — the agent physically cannot write to a customer-readable surface.
Cross-cutting: keep the governed run replayable and the evidence independently verifiable: the agent runs under a single immutable Release; any change to model, instructions, parameters, or tool bindings produces a new hashed Release, so 'what was running on the date of this disposition' is a provable question.

Mapped to regulation

Regulatory mapping

Framework	Article / section	Obligation (plain language)	How a KLA runtime control satisfies it	Source
FATF Recommendations	Recommendation 20 — Reporting of suspicious transactions	If an institution suspects or has reasonable grounds to suspect funds are proceeds of crime or relate to terrorist financing, it must by law report promptly to the FIU. An alert-triage agent influences this trigger every time it dismisses, escalates, or recommends a filing.	The auto-close checkpoint (runtime_controls[0]) prevents the agent from silently extinguishing a reportable suspicion: elevated-risk and linked-case closes are blocked or routed to a named L2 human before the alert leaves the queue, so the decision to not report is always made (or ratified) by a person.	Source
FATF Recommendations	Recommendation 10(d) — Ongoing CDD: scrutiny of transactions	Institutions must conduct ongoing scrutiny of transactions across a relationship to ensure they are consistent with the institution's knowledge of the customer, business, risk profile and source of funds. Transaction monitoring operationalizes this duty.	The linked-case block in the auto-close checkpoint and the disposition–rationale consistency block (runtime_controls[0], runtime_controls[2]) stop the agent from disposing of an alert on a partial, isolated snapshot — preserving the 'consistent with the institution's knowledge of the customer' test against the stale-context failure mode.	Source
EU AMLR — Regulation (EU) 2024/1624	Article 69(1) — Reporting of suspicions	Obliged entities must promptly report to the FIU, on their own initiative, where they know/suspect/have reasonable grounds to suspect that funds or activities (regardless of amount) are proceeds of crime or related to terrorist financing. All suspicious transactions — including attempted ones and suspicions from inability to complete CDD — must be reported.	The auto-close require_approval/block rules (runtime_controls[0]) ensure no in-scope suspicion is closed without human sign-off, and the SAR-narrative human-signoff rule (runtime_controls[1]) keeps the report-or-not decision a human one — so the Art. 69(1) 'on their own initiative' duty rests with an accountable person, not the agent.	Source
EU AMLR — Regulation (EU) 2024/1624	Article 73 — Prohibition of disclosure (tipping-off)	Obliged entities and their staff — explicitly including agents — must not disclose to the customer or third parties that activity is being assessed under Art. 69, that information has/will be transmitted to the FIU, or that an ML/TF analysis is underway.	The tipping-off block plus least-privilege tool binding (runtime_controls[2]) mechanically prevent the agent from writing confidentiality-tier-2 content into any customer-readable or out-of-perimeter destination: the write is blocked, and the agent has no Tool Catalog binding to a customer-facing surface in the first place.	Source
US BSA / FinCEN — 31 CFR Chapter X	31 CFR § 1020.320(b)(3) — SAR filing deadline	A bank must file a SAR no later than 30 calendar days after initial detection of facts that may constitute a basis for filing; the deadline may extend to identify a suspect but in no case beyond 60 calendar days after initial detection.	Capturing the auto-close decision with a sealed timestamp on the Lineage Record (runtime_controls[0], runtime_controls[3]) makes 'date of initial detection' and the disposition that started/closed the clock a provable, queryable record — so the institution can demonstrate the 30/60-day clock was respected rather than silently blown by an agent close.	Source
US BSA / FinCEN — 31 CFR Chapter X & 31 U.S.C. § 5318	31 CFR § 1020.320(e) and 31 U.S.C. § 5318(g)(2)(A)(i) — SAR confidentiality / notification prohibited	No bank or its agent may disclose a SAR or any information that would reveal its existence, and an institution (including its agents and contractors) may not notify any person involved in a transaction that it has been reported.	The same tipping-off block and least-privilege binding (runtime_controls[2]) enforce the US confidentiality bar: the agent — an 'agent' for the purposes of § 5318(g)(2) — is blocked from emitting SAR-revealing content to any unapproved destination, and its writes are confined by Data Boundaries to the governed case SoR.	Source
Fed/OCC SR 11-7 (interagency model-risk guidance)	SR 11-7 / OCC 2011-12 — model definition, model risk, 'effective challenge'	A model that processes inputs into estimates invariably creates model risk; the guiding control is 'effective challenge' — critical analysis by objective, informed parties who can identify limitations and produce changes. An LLM alert-triage agent is itself a model that must be validated, challenged, and monitored.	The maker-checker Decision Desk Escalation (runtime_controls[0], runtime_controls[1]) institutionalizes 'effective challenge' on the highest-stakes dispositions — a named, competent human independently reviews the agent's call — and the disposition–rationale consistency block (runtime_controls[2]) catches the model's incorrect-but-fluent output that self-report would never reveal.	Source
EU AI Act — Regulation (EU) 2024/1689	Annex III — high-risk classification scope (and the credit-scoring fraud-detection exception)	Annex III enumerates eight high-risk areas; the financial-services entry covers creditworthiness/credit-scoring 'with the exception of AI systems used for the purpose of detecting financial fraud.' AML transaction monitoring / financial-crime detection is not enumerated anywhere in Annex III, so it is not automatically a high-risk AI system.	This is a scoping mapping, not a control mapping: it tells the deployer that the high-risk conformity regime is not the load-bearing obligation here. The runtime controls are therefore designed to satisfy the AML and model-risk regimes first, while voluntarily adopting the EU AI Act's human-oversight and logging disciplines (below) as good practice and against the case where a deployer is otherwise in scope.	Source
EU AI Act — Regulation (EU) 2024/1689	Article 14(4)(b),(d),(e) — Human oversight	Oversight persons must stay aware of automation bias (over-reliance on the system's output), be able to decide not to use / disregard / override / reverse the output, and be able to interrupt the system via a 'stop' to a safe state.	require_approval pause + Decision Desk override and re-route (runtime_controls[0], runtime_controls[1]) are the direct mechanical analog of disregard/override/reverse; block (runtime_controls[2]) is the 'stop to a safe state' — the agent's action is halted with a structured denial. The reviewer sees reason codes precisely to counter automation bias rather than rubber-stamping the agent.	Source
EU AI Act — Regulation (EU) 2024/1689	Article 26(2) & 26(6) — Deployer obligations	Deployers must assign human oversight to natural persons with the necessary competence, training and authority, and keep auto-generated logs under their control for at least six months.	Decision Desk routes Escalations to named, competent L2/L3/MLRO-delegate reviewers (runtime_controls[0–2]) — satisfying the competence-and-authority requirement — and the Evidence-by-Default pipeline retains automatically generated Lineage Records well beyond the six-month floor (runtime_controls[3]).	Source
EU AI Act — Regulation (EU) 2024/1689	Article 12(1) — Record-keeping (automatic logging)	High-risk AI systems must technically allow automatic recording of events (logs) over the system lifetime to enable traceability appropriate to the intended purpose.	Evidence-by-Default captures every Decision Request, policy decision, tool call, and human verdict automatically (runtime_controls[3]) and seals each into an append-only, Merkle-proofed Lineage Record — satisfying the automatic-logging and traceability requirement as a built-in property rather than a bolt-on, even though Art. 12 binds strictly only where the system is in high-risk scope.	Source

Prove the control held

Audit-evidence checklist

For every auto-close: the Decision Request, the policy outcome + reasonCodes, and — where elevated-risk — the named L2 approver's verdict and timestamp, sealed on the Lineage Record (proves no in-scope suspicion was closed without human sign-off; FATF R.20 / AMLR Art. 69).
For every SAR/STR narrative: the exact agent draft text, the require_approval record, and the named SAR-filing-officer's authorization — establishing human authorship of the filing decision and that the agent only drafted (SR 11-7 effective challenge; AMLR Art. 69(1) 'on their own initiative').
For every disposition write: the paired disposition + rationale as submitted, plus the destination tool and its Tool Catalog binding proving the write reached only the approved case SoR and never a customer-readable surface (tipping-off: AMLR Art. 73 / 31 CFR § 1020.320(e) / 31 U.S.C. § 5318(g)(2)).
A sealed initial-detection timestamp on each alert disposition so the 30/60-day BSA SAR clock is provable rather than reconstructed (31 CFR § 1020.320(b)(3)).
The active Release hash (model + instructions + parameters + tool bindings) stamped on each Lineage Record, answering 'exactly what configuration produced this disposition.'
Independent verification: each Lineage Record verifiable via GET /v1/lineage/{id}/verify by recomputing the Merkle root against the published ImmuDB ledger root — no trust in KLA required.
Retention of automatically generated logs for at least the six-month minimum (EU AI Act Art. 26(6)), exportable as a Sealed Evidence Bundle or an EU AI Act Annex IV Control Pack for an examiner.
A periodic 'effective challenge' pack: a sample of auto-closed alerts re-reviewed by an independent party, with their verdicts captured — evidence the model-risk monitoring required by SR 11-7 actually ran.

A concrete intercept

Reference scenario: An agent tries to auto-close an alert on a PEP with a fluent rationale — and a named L2 investigator gets the veto

1
The alert-triage agent reads alert ALRT-77214 (rapid round-number transfers into and out of a business account) and decides to auto-close it as no-further-action, generating the rationale 'pattern consistent with prior supplier payments.'
2
Before the close is written, the KLA SDK checkpoint wrapping close_alert submits a Decision Request to POST /v1/decisions.evaluate, carrying attributes: customer_pep=true, jurisdiction_risk=high, linked_open_alerts=1.
3
Policy matches two rules: AML_AUTOCLOSE_ELEVATED_RISK (PEP + high-risk jurisdiction) returns require_approval, and AML_AUTOCLOSE_LINKED_CASE_OPEN (a related open alert exists) returns block. By precedence the single block wins — the close does not proceed; the agent receives a structured denial with reason codes and remediation.
4
Because a related case is open, policy also routes a Decision Desk Escalation to the named L2 financial-crime investigator who owns that customer's case. The reviewer sees the alert, the agent's proposed close and its rationale, both reason codes, and a link to the Lineage Record.
5
The investigator disregards the agent's close (the Art. 14(4)(d) override), links the two alerts, and escalates to L3 — the maker-checker control and SR 11-7 'effective challenge' in action.
6
Every step — the Decision Request, the block + require_approval outcomes, the reason codes, the reviewer's identity and verdict, and the active Release hash — is sealed into an append-only Lineage Record with a Merkle proof, verifiable later via GET /v1/lineage/{id}/verify and exportable into an EU AI Act Annex IV Control Pack with no trust in KLA required.

What most teams get wrong

The non-obvious insight

An AML transaction-monitoring agent is almost certainly NOT a 'high-risk AI system' under EU AI Act Annex III — and that is precisely why it needs more deliberate governance, not less. Annex III's financial-services entry covers credit-scoring but carves out 'AI systems used for the purpose of detecting financial fraud,' and financial-crime detection appears nowhere else in the list. So the deployer gets no CE mark, no provider conformity assessment, no Annex IV technical file delivered to them — none of the high-risk scaffolding backstops these decisions. The binding force comes instead from AML law (FATF R.20, AMLR Art. 69/73, the BSA SAR rules) and from SR 11-7 model-risk supervision, where the agent is unambiguously a 'model' subject to validation and 'effective challenge.'

Why it matters: Teams routinely reason backwards: 'the EU AI Act is the strict regime, so if our agent isn't high-risk we can govern it lightly.' For AML triage that inference is exactly inverted. The absence of an Annex III conformity wrapper means the deployer's own runtime controls — human sign-off on the report/no-report decision, tipping-off containment, sealed evidence of the clock — are the only thing standing between an auto-closed alert and a years-later regulatory look-back. The EU AI Act here is best used voluntarily as a human-oversight and logging discipline (Art. 12/14/26), while the load-bearing obligations are AML and model-risk. Mis-classifying the regime leads straight to under-controlling the one decision — the silent auto-close — that has no human in the loop and no exception to catch it.

The US BSA gives a bank a hard ceiling of 60 calendar days from initial detection to file a SAR (30 days, extendable by 30 to identify a suspect) — which means an alert-triage agent that auto-closes a true-positive alert does not merely make an error, it silently starts and then blows a statutory clock that no one is watching, because a closed alert leaves the work queue and generates no aging exception. (source)

Q&A

Frequently asked questions

Is an AML transaction-monitoring agent a high-risk AI system under the EU AI Act?

Most likely not. EU AI Act Annex III enumerates eight high-risk areas; its financial-services entry covers creditworthiness and credit-scoring but expressly excludes 'AI systems used for the purpose of detecting financial fraud,' and AML/financial-crime detection appears nowhere else in Annex III. So an AML triage agent is generally not automatically high-risk. The governance obligations flow primarily from AML law (FATF R.20, EU AMLR Art. 69/73, the US BSA SAR rules) and model-risk supervision (SR 11-7); the EU AI Act applies as human-oversight and record-keeping good practice (Art. 12/14/26) and where the deployer is otherwise in scope. Confirm classification against your own deployment and seek counsel — do not assume 'not high-risk' means 'low-governance.'

Which agent decisions must a human sign off on, and which can run automatically?

The two decisions that move a legal reporting obligation get a human gate. A SAR/STR narrative is always require_approval — the agent drafts, a named SAR-filing officer authorizes the filing decision. An auto-close is gated to require_approval (or blocked) whenever the alert carries elevated-risk attributes (PEP, high-risk jurisdiction, prior SAR) or a related case is open, routing to a named L2 investigator. Routine low-risk closes and ordinary disposition writes can proceed as allow or warn, but every one still records a reason code on its Lineage Record so the decision is reconstructable.

How does governing the agent prevent a tipping-off breach?

Two layers. First, a content check blocks any disposition narrative or case note that states or implies a SAR is being filed (or that an ML/TF analysis is underway) when its destination is a customer-readable or out-of-perimeter field (reasonCode AML_TIPPING_OFF_RISK). Second — and more fundamentally — least-privilege Tool Catalog binding plus Data Boundaries mean the agent has no tool that can write to a customer-facing CRM, message, or relationship-manager queue at all. EU AMLR Art. 73 and the US BSA (31 U.S.C. § 5318(g)(2), 31 CFR § 1020.320(e)) extend the tipping-off bar explicitly to agents, so confining the agent's writes to the governed case SoR is the controlling mechanism.

How do you prove to an examiner that the agent didn't silently miss a filing?

Every disposition — including every auto-close — is captured automatically as an append-only Lineage Record carrying the Decision Request, the policy outcome and reason codes, any human verdict, the active Release hash, and a sealed initial-detection timestamp. Because the records are anchored to a Merkle-proofed ImmuDB ledger, an examiner can verify them via GET /v1/lineage/{id}/verify without trusting KLA, and you can export the relevant slice as a Sealed Evidence Bundle or an EU AI Act Annex IV Control Pack. The closed alerts are evidence too, not a blind spot: you can demonstrate which were closed by policy-as-allowed and which were ratified by a named human.

The agent's accuracy against historical analyst labels is high — isn't that enough validation?

No, and relying on it is the classic trap. Historical labels are dominated by closures because alert false-positive rates are very high, so an agent that closes aggressively scores well on label-agreement while systematically suppressing the rare true positive — the exact case that must be reported. SR 11-7 calls this out: model risk is adverse consequences from incorrect-but-used output, and the prescribed control is 'effective challenge' by objective, informed parties, not the model's self-reported accuracy. Governance adds that challenge structurally: maker-checker review on high-stakes closes and a periodic independent re-review of auto-closed alerts, both captured as evidence the monitoring actually ran.

Does KLA build or run the AML agent?

No. The customer builds and owns the AML triage agent (LangGraph, CrewAI, Agentforce, Microsoft Copilot, or in-house). KLA is the independent runtime governance and assurance layer that governs the agent in place: it intercepts each consequential action before it executes, enforces policy-as-code with the four outcomes (allow / warn / require_approval / block), routes high-stakes decisions to named human approvers in Decision Desk, and seals signed execution lineage mapped to regulation. KLA never files a SAR, never closes an alert, and never makes the call — humans hold the veto on require_approval, and policy holds authority over whether an action runs.

Related blueprints & guides

Primary sources

Govern this Process without re-platforming the agent

KLA wraps the agent you already run, gates each high-stakes action, routes the hard calls to a named human, and seals independently verifiable evidence mapped to regulation.

Book a demo Read the governance guide