Architecture Overview
A product-altitude mental model of how the KLA Control Plane wraps your existing AI agents to govern, operate, assure, and prove every action.
The KLA Control Plane is a govern-in-place runtime safety, audit, and governance layer for enterprise AI agents. "Govern in place" means you instrument the agents you already run (LangChain chains, FastAPI services, or fully custom code) instead of re-platforming them onto a new runtime. KLA wraps those agents with a thin instrumentation layer and a set of always-on planes that decide whether each action is allowed, capture what happened, and seal cryptographic proof of it.
This page is a logical mental model, not an infrastructure diagram. It describes the planes a developer, operator, or auditor needs to understand, and how a single agent action travels through them.
The Planes at a Glance
KLA is organized into a small number of cooperating planes. Each one maps to one of the four product pillars: Govern. Operate. Assure. Prove.
| Plane | What it does | Pillar |
|---|---|---|
| Instrumentation / SDK | Emits OpenTelemetry spans from your agent and asks for policy decisions before risky actions | Govern |
| Policy Engine | Evaluates each action against signed policy packs and returns one of four outcomes | Govern |
| Decision Desk | Holds paused actions for human review when a policy requires approval | Govern / Operate |
| Collector | Receives telemetry, redacts PII, normalizes spans | Operate / Prove |
| Evidence Ledger | Anchors sealed records to an append-only cryptographic ledger | Prove |
| Console surfaces | The web modules where humans drive and inspect the system | All four |
How an Action Flows
Every governed action follows the same path. The agent calls the KLA SDK before it executes a sensitive step (a tool call, a database write, a payment). The SDK opens an OpenTelemetry span and submits a Decision Request to the Policy Engine, which returns one of four outcomes in precedence order: allow, warn, require_approval, or block. An allow or warn lets the action proceed; a require_approval pauses execution and routes an Escalation to the Decision Desk for a human to approve or deny; a block stops the action outright. Whatever the outcome, the span flows through the Collector to the Evidence Ledger.
flowchart LR
A["Agent action"] --> SDK["KLA SDK<br/>instrumentation"]
SDK --> PE{"Policy Engine<br/>decision"}
PE -->|allow / warn| EX["Execute action"]
PE -->|require_approval| DD["Decision Desk<br/>human review"]
PE -->|block| ST["Stop action"]
DD -->|approved| EX
DD -->|denied| ST
EX --> COL["KLA Collector<br/>PII redaction"]
ST --> COL
DD --> COL
COL --> LED["Evidence Ledger<br/>Merkle proofs"]
LED --> EB["Sealed Evidence Bundle"]The Instrumentation Layer
Two deployment patterns connect your agents to KLA:
- Govern in Place: You add the KLA SDK to your existing code. It emits async OpenTelemetry spans and runs in-process gates, so a policy decision happens inline before the action. Your agent keeps running where it already lives.
- Run through KLA: You route agent execution through a managed proxy via the Executions API. KLA observes and gates every step centrally, which is useful when you cannot modify the agent's code.
Both patterns produce the same telemetry and feed the same downstream planes. Spans carry GenAI semantic attributes that make agent behavior legible: genai.agent.name, genai.system.instructions, genai.tool.name, genai.tool.parameters, genai.cost.usd, and genai.token.usage.
from kla import KLA
kla = KLA(api_url="https://api.kla.digital", tenant="acme")
# Ask for a decision before a sensitive tool call.
decision = kla.decide(
agent="claims-triage",
action="execute_refund",
attributes={"amount": 240.00, "currency": "USD"},
)
if decision.outcome == "allow":
issue_refund()
elif decision.outcome == "require_approval":
decision.wait() # Pauses until the Decision Desk resolves the Escalation.
The Policy Engine
KLA uses the KLA Policy Engine at the application layer to evaluate Decision Requests. Policies are declarative rules that say who may do what, under which conditions, and which runtime checks should apply. You author them in Policy Builder, run a Simulation against historical traffic before publishing, then compile them into signed policy packs: tamper-evident bundles the engine loads at runtime. Every check resolves to the same four-outcome model, so a match can allow the action, record a warning, pause for human approval, or block execution.
The Evidence Pipeline
The data path that produces proof is deliberately linear and one-directional:
flowchart LR S["OpenTelemetry spans"] --> C["KLA Collector<br/>PII redaction"] C --> L["ImmuDB ledger<br/>append-only, Merkle proofs"] L --> B["Sealed Evidence Bundle"] B --> P["Control Pack export"]
Spans arrive at the KLA Collector, which redacts PII (emails, card numbers, secrets) before anything is stored. Sanitized records are hashed and anchored into ImmuDB, an append-only cryptographic ledger: records cannot be edited, deleted, or reordered, and each is provable via a Merkle proof. From there, auditors export a Sealed Evidence Bundle (a single trace's verifiable record, called a Lineage Record) or a Control Pack (a compliance export mapped to a framework) from the Evidence Room.
Multi-Tenancy and the Data Path
KLA is multi-tenant by design. Every request carries a tenant identity, and every record (Decision Request, Lineage Record, Escalation, evidence anchor) is scoped to a single tenant. Callers assert their tenant on every API call:
curl https://api.kla.digital/v1/decisions \
-H "Authorization: Bearer <token>" \
-H "x-tenant-id: <tenant>"
Tenant isolation is enforced at the data layer, so one tenant's policies, traces, and evidence are never visible to another. The same boundary applies end to end: telemetry from a tenant's agents only flows into that tenant's ledger and only surfaces in that tenant's Console.
The Console Surfaces
The planes are driven and inspected through seven web modules, each a window onto one part of the flow:
- Command: the home surface for live agent health and recent activity.
- Agents: the Agent Registry and lifecycle: a Release (workflow version), a Rollout (deployment), and Rollback.
- Policy Builder: author, simulate, and publish policies.
- Decision Desk: resolve paused Escalations with full context.
- Lineage Explorer: replay any Lineage Record step by step.
- Assurance Center: track drift as Assurance Alerts and act on a Remediation Plan (the "Assure" pillar).
- Evidence Room: export Sealed Evidence Bundles and Control Packs.
Where to Go Next
Start with the two foundational concepts that this overview only sketches: Policy-Gated Execution for the four-outcome decision model, and Evidence-by-Default for the cryptographic ledger. Then explore the modules in depth, Policy Builder, Decision Desk, Lineage Explorer, and Evidence Room, or wire up an agent with the Python SDK or Node SDK.
