Pawan K. Pradhan - Governing AI Agents

Governing AI Agents

Date: 12/01/2025

A technical blueprint for governing autonomous agents.
An Engineering-First Approach to Agent Security.

We are witnessing a fundamental shift in AI architecture: the move from Chatbots (systems that talk) to Agents (systems that do).

While a chatbot might tell you how to reset a password, an agent has the "hands" to actually reset it, update the ticket, and email the user. This agency introduces a massive new attack surface. You are no longer just governing text generation; you are governing execution paths, API calls, and state changes.

Effective governance for agents requires a shift from human-speed policy to machine-speed enforcement. It rests on a foundation of rigorous Data Governance—using tools like Unity Catalog for centralised access control and lineage, and MLflow for lifecycle management and tracing.

Technical Blueprint for governing autonomous agents.

To survive this shift, we must operationalise governance across four core pillars:

Lifecycle Management: Tracking the agent from prompt engineering to deprecation.
Risk Management: Quantifying the "blast radius" of an autonomous decision.
Security: Hardening the identity and execution environment.
Observability: The "Audit Everything" mandate—logging not just the output, but the thought process.

Identity and Access Management (IAM) for Non-Human Identities

The Pillar: Security & Risk Management
The Core Problem: How do we authorize an agent to act on a user's behalf without granting it "God Mode"?

Agents often suffer from the "Confused Deputy" problem—where an attacker tricks the agent into using its legitimate high-level privileges to perform unauthorized actions. If an agent has a standing admin token, a simple prompt injection can wipe a database.

The Technical Fix:

Scoped "On-Behalf-Of" (OBO) Flows: Never hardcode service account credentials. Use OAuth 2.0 OBO flows where the agent exchanges the user’s token for a downstream token with only the scopes required for that specific session.
Ephemeral Just-in-Time (JIT) Access: The agent requests permissions dynamically. If it needs to read a file, it requests a read-only token valid for 5 minutes, performs the action, and discards the token.
Cryptographic Identity Attestation: Use SPIFFE/SPIRE to give the agent a verifiable workload identity. Every API call the agent makes should be cryptographically signed, ensuring that the "caller" is the authorized model version, not a hijacked process.

Audit Everything: Log the identity context of every tool call. "Agent X called API Y" is insufficient. The log must read: "Agent X, acting on behalf of User Z, called API Y with Scope Read-Only.

2. Runtime Guardrails and Deterministic Policy Enforcement

The Pillar: Risk Management & Observability
The Core Problem: LLMs are probabilistic (creative), but enterprise security policies must be deterministic (binary).

You cannot rely on the system prompt ("Please do not reveal PII") to secure your data. You need a distinct architectural layer that intercepts traffic before it reaches the model and before the model reaches the tool.

The Technical Fix:

Input/Output Guardrailing: Implement middleware (like NVIDIA NeMo Guardrails or Guardrails AI) that acts as a firewall. It scans inputs for jailbreak attempts and outputs for PII leakage using regex and semantic classifiers.
Policy as Code (OPA): Use the Open Policy Agent (OPA) to enforce logic at the network level. For example, write a Rego policy that states:

ALLOW http_post TO external_api ONLY IF body DOES_NOT_CONTAIN pii_tag.

Syntactic vs. Semantic Filtering: Combine keyword blocking (syntactic) for known threats with embedding-based intent detection (semantic) to catch subtle policy violations that don't use "bad words."

3. Governing the RAG Pipeline (Privacy & ACL Propagation)

The Pillar: Security & Lifecycle Management
The Core Problem: Retrieval-Augmented Generation (RAG) flattens security. If an agent retrieves a document from a Vector DB, does it respect the original file's Access Control List (ACL)?

If a junior analyst asks "What are the CEO's bonuses?", and the agent retrieves a sensitive HR PDF because it was embedded without permissions metadata, you have a data leak.

The Technical Fix:

ACL-Aware Embedding: When ingesting data into your Vector DB, you must embed the ACLs alongside the vectors.
Query-Time Filtering: When the agent queries the Vector DB, it must pass the user's identity.

The database then applies a pre-filter: SELECT * FROM embeddings WHERE user_id IN allowed_users.

Unity Catalog Integration: Leverage Unity Catalog to govern unstructured data (Volumes) alongside structured tables. This ensures that the agent's access to "Gold" layer data is governed by the same policies as your BI dashboards.

4. Supply Chain Security for Agentic Tools

The Pillar: Lifecycle Management & Security
The Core Problem: Agents execute code and call third-party tools. An agent is only as secure as the libraries it imports.

If your agent has the ability to pip install packages or call unverified APIs, you are vulnerable to supply chain attacks where a malicious package exfiltrates the agent's context window.

The Technical Fix:

The AI SBOM (Software Bill of Materials):
Maintain a strict inventory of every model weight, embedding file, and tool definition used by the agent.
Sandboxed Execution:
Never run agent-generated code on the host. Use ephemeral, network-restricted microVMs (like Firecracker or gVisor) or WebAssembly (WASM) sandboxes to execute Python/SQL code.
API Whitelisting (Egress Filtering):
Enforce strict network policies. The agent can talk to internal-crm.api but strictly block unknown-external.site.

5. Adversarial Red Teaming for Agent Logic

The Pillar: Risk Management & Observability
The Core Problem: Standard vulnerability scanners (DAST/SAST) don't work on cognitive architectures. They can't find "hallucinations" or "goal hijacking."

The Technical Fix:

Automated Red Team Agents:
Deploy "Attacker Agents" whose sole utility function is to break your "Defender Agent." They will try to force your agent to ignore its system prompt or leak data.

Goal Hijacking Tests:
Specifically test if the agent can be diverted from its directive. Can a user convince a "Customer Support Agent" to become a "Linux Terminal" or a "Crypto Miner"?

MLflow Evaluation:
Use MLflow's LLM Evaluate to run continuous scoring on agent outputs against a "Golden Dataset" of known adversarial prompts, tracking drift in safety scores over time.

Audit Everything: In this phase, you audit the failures. Every successful jailbreak in the red team phase must become a regression test case in your CI/CD pipeline.

Conclusion: The "Audit Everything" Mandate

Governance is not a document; it is a platform capability.

To successfully govern AI agents, you must enforce Traceability. Every token generated, every tool invoked, and every document retrieved must be logged in a centralized store (like MLflow Tracing).

When an agent makes a mistake, you shouldn't have to ask "Why?" You should be able to replay the tape: see the input, see the retrieved context, see the chain-of-thought reasoning, and see the tool execution. That is the only way to build trust in systems that have hands.