Deployment Patterns & Setup | Developer Docs

The KLA Control Plane is a govern-in-place runtime safety, audit, and governance layer for enterprise AI agents. You instrument the agents and APIs you already run (LangChain, FastAPI, or custom code) instead of re-platforming them. KLA supports two integration patterns, and you can mix them across a single deployment: Govern in Place wraps your existing runtime with the OpenTelemetry SDK, while Run through KLA routes execution through a managed proxy via the Executions API. This page compares the two, shows how each works, and helps you choose.

Topology Comparison

Feature	Govern in Place (Instrumented)	Run through KLA (Managed Proxy)
Integration Hook	OpenTelemetry SDK + checkpoint wrapper	Executions API (REST)
Execution Host	Your own cloud / containers	KLA Runtime
Re-platforming Required	None	Low (endpoint switch)
Added Latency	Near zero (asynchronous spans)	Standard proxy routing
Enforcement Vector	In-process gates at checkpoints	Inline request blocking
Best Fit	Existing production LangChain / custom stacks	Greenfield projects and standard API agents

ℹ️ Note

Both patterns produce the same governance outcomes. Every action is evaluated against the same policy engine and recorded as a Lineage Record (the canonical name for an execution trace) in the same cryptographic ledger. The choice is about how enforcement reaches your code, not about what you give up.

Pattern 1: Govern in Place

Your agents keep running in their own environment. The KLA SDK emits OpenTelemetry spans asynchronously to the Control Plane, so telemetry never sits in the request path. Enforcement happens at checkpoints: points in your code where you ask KLA whether an action is permitted before executing it. At each checkpoint the policy engine returns one of four outcomes, evaluated in precedence order: allow, warn, require_approval, or block. A require_approval outcome pauses execution and routes an Escalation to the Decision Desk, KLA's human-review surface, where a reviewer resolves the Decision Request.

flowchart LR
  A["Your agent app (SDK wrapped)"] --> B["LLM provider"]
  A -->|checkpoint| C{"Policy decision"}
  C -->|allow / warn| A
  C -->|require_approval| D["Decision Desk"]
  C -->|block| E["Action denied"]
  A -. async spans .-> F["KLA Control Plane"]

Setup Instructions

Install the SDK for your application language (Python or Node.js).
Initialize the SDK at your application entrypoint so spans flow to the Control Plane.
Wrap sensitive tool actions or model completions in a KLA checkpoint.

# Govern in Place: evaluate an action at a checkpoint
from kla_sdk import KLACheckpoint

checkpoint = KLACheckpoint(agent_id="agt_9f81a7")

# If policy returns require_approval, this blocks and opens an
# Escalation on the Decision Desk until a reviewer resolves it.
decision = checkpoint.evaluate_action(
    tool="process_refund",
    context={"amount": 1250.00},
)

if decision.outcome == "allow":
    execute_refund()
else:
    raise PermissionError(f"Refund {decision.outcome} by governance policy")

💡 Tip

Checkpoints are cheap. Place them around the actions that carry real risk (payments, data exports, irreversible writes) rather than every model call. The asynchronous spans capture full execution context either way.

Pattern 2: Run through KLA

Here your client sends requests to the Executions API instead of calling a model provider directly. KLA resolves provider credentials from the Secrets Vault, applies policy inline, runs the agent against the model, and records the Lineage Record, all in one consolidated hop. Because enforcement is inline, a block outcome stops the request before any tool runs.

flowchart LR
  A["Client frontend (Chat UI)"] -->|Executions API| B["KLA Control Plane"]
  B --> C{"Policy decision"}
  C -->|allow / warn| D["LLM provider"]
  C -->|require_approval| E["Decision Desk"]
  C -->|block| F["Request rejected"]
  D --> B
  B -->|response + Lineage Record| A

Setup Instructions

Store your model provider credentials in the Secrets Vault.
Register the agent, with its system instructions, parameters, and tools, in the Agent Registry.
Point your client at the Executions API instead of OpenAI or Anthropic.

curl -X POST https://api.kla.digital/v1/executions \
  -H "Authorization: Bearer $KLA_ACCESS_TOKEN" \
  -H "x-tenant-id: $KLA_TENANT" \
  -H "Content-Type: application/json" \
  -d '{
    "agentId": "agt_9f81a7",
    "input": { "prompt": "Summarize outstanding contracts" }
  }'

A require_approval outcome returns an Escalation reference and a pending status; poll the execution or subscribe to Decision Desk events to continue once a reviewer approves.

Choosing a Pattern

For developers and integrators with an existing production agent, say a LangChain claims-triage workflow already serving traffic, Govern in Place is the fastest path: add the SDK and a few checkpoints, change no routing, and keep your latency budget intact. For platform operators standing up a greenfield agent or a standard chat interface, Run through KLA removes the most code, since KLA owns credential injection, inline enforcement, and trace compilation behind one endpoint. For compliance and risk officers, the decision is reassuringly neutral: both patterns feed the same policy engine and the same sealed evidence pipeline, so your Control Pack exports and Sealed Evidence Bundles look identical regardless of how the agent is wired. Most enterprises run both, instrumenting legacy systems in place while routing new agents through KLA, and govern them from a single Control Plane.