Security·12 min read

Zero-Trust
for AI Agents

A safe prompt is not a safe system. The 6-layer Agent Trust Stack for production security.

Abstract

As AI agents move from "helpful copilots" to systems that take real actions—issuing refunds, creating purchase orders, updating customer records, triggering infrastructure operations, and coordinating with suppliers—the risk profile changes qualitatively.

The central problem is no longer whether the model can complete the task. It is whether the organization can trust the system that surrounds the model: the identity of the agent, the scope of what it can do, the controls that govern high-impact actions, the evidence trail that proves what happened, and the mechanisms that contain failures when something goes wrong.

The Core Problem

Prompt-level safety is insufficient for production environments because it is inherently non-deterministic. Even a well-prompted agent can be coerced through tool output, retrieval content, indirect prompt injection, or simple misinterpretation of ambiguous instructions.

Enterprises need a security posture that treats agents like any other powerful software actor:

untrusted by default, controlled by explicit policy, constrained by least privilege, and observable through ground-truth logging.

1. The Hard Truth: A Safe Prompt Is Not a Safe System

The industry's first wave of agent deployments leaned heavily on prompt design. Teams tried to "make the agent safe" by instructing it to be careful, to avoid sensitive data, to ask for approvals, or to refuse certain actions. This was a reasonable starting point because prompts are easy to change and quick to iterate.

Prompts are not enforcement.

They are suggestions delivered to a probabilistic system operating in a dynamic environment.

Production systems do not run on suggestions. They run on constraints.

In practice, the agent's operating context includes far more than the user's instructions. It includes tool definitions, tool results, retrieved documents, internal system messages, error logs, and sometimes adversarial inputs from outside parties. It only takes one untrusted tool response or one cleverly worded document to shift the agent's behavior.

The system needs a different posture:

assume the agent will make mistakes, and design the environment so mistakes cannot become incidents.

That posture has a name in security: Zero Trust.

2. What Zero Trust Means When the "User" Is an Agent

Zero trust is often summarized as "never trust, always verify." In enterprise security, it means you do not grant broad access simply because something is inside the network or because it passed a one-time authentication check. Every request is evaluated against identity, policy, context, and risk.

When the actor is an AI agent rather than a human or a service, the principle remains the same, but the operational details shift:

Agents are not stable in the way humans are
They do not have persistent intent
They can be manipulated indirectly
They can be steered by tool output
They can be confused by irrelevant context
They can execute high-impact actions quickly

Zero trust for agents is not about distrust of the model; it is about trust in infrastructure.

3. The Agent Trust Stack

Most enterprise buyers instinctively ask the same four questions the moment agents can act:

Whois it?
Whatcan it do?
Can we provewhat it did?
Can westop it?

A practical Agent Trust Stack has six layers:

1

Identity

A production system needs to know who is acting. The agent is a principal with an owner, a purpose, an environment, and an entitlement set. You cannot restrict what you cannot identify.

2

Least Privilege

Grant the minimum access required for the job. A support agent should not have finance tools. A vendor coordination agent should not have access to customer PII. Least privilege for agents is not merely about preventing malicious behavior—it is about preventing accidental capability.

3

Policy Enforcement

If you want predictable safety, the rules that govern actions must be deterministic. They must exist outside the agent's internal reasoning. If a refund over a threshold requires human approval, the system should enforce that every time, regardless of how confident the agent feels.

4

Approvals

Enterprises require approvals because they need accountability for high-impact decisions. The correct approach is surgical approvals: require human sign-off only at the points where risk or liability crosses a defined threshold.

5

Evidence

Enterprises do not operate on stories. They operate on logs, traces, approvals, and records of action. A production-grade agent platform must behave like an aircraft black box: it records actions, decisions, and outcomes in a way that can be reviewed after the fact independent of what the pilot believed.

6

Containment & Response

Even with perfect controls, systems fail. Enterprises need the ability to limit blast radius, revoke privileges, disable access quickly, and reconstruct events. A system that cannot be stopped is not autonomous; it is uncontrollable.

4. Why the Real Attack Surface Is the Tool Boundary

A useful mental model is that an agent is not attacked through the prompt alone. It is attacked through its environment. Tool results can contain adversarial content. Prompts can be injected. Data can be leaked.

The Tool Boundary Is Where Risk Lives

Most harm does not occur when the agent drafts a plan. Harm occurs when the plan touches a tool. The right design is not to make the agent perfectly safe in its own mind. The right design is to make the environment safe regardless of what the agent "wants."

If you like metaphors, this is the difference between training a tiger to never bite and building a habitat where biting cannot reach the public.

You don't ask the agent to be safe. You put the agent in a safe room.

5. RelayOne: Zero-Trust Control Plane at the Action Boundary

RelayOne operationalizes the Agent Trust Stack as infrastructure. It sits between agent frameworks and the systems agents want to touch, and it turns each agent-to-system action into a governed event.

The important point is not that RelayOne "adds safety."

The important point is that RelayOne relocates safety to the correct layer.

Instead of relying on the agent's internal compliance with instructions, RelayOne enforces the organization's rules at the moment of action.

Identity & Attribution

Ties each tool call to an identity and an owner. Creates a foundation for scoping access, applying policies, and investigating incidents.

Scoped Tool Access

Exposes tools according to authorization. Prevents "free-for-all swarms" and reduces context pollution.

Deterministic Policy Enforcement

Evaluates each attempted action against policies. Enforcement does not depend on the agent remembering anything. It is infrastructure.

Evidence by Default

Records the ground truth of interactions: the request, the response, the policy decision, and the outcome.

Conclusion: Don't Trust Agents; Trust Boundaries

A safe agent is not a prompt. It is a system. The only durable way to scale agent adoption in an enterprise is to treat agents as untrusted by default and to make their actions governable through deterministic infrastructure. That is the essence of zero trust, translated into the agent era.

Prompt safety can reduce mistakes. Production safety prevents incidents. Enterprises need the latter if agents are going to act.

RelayOne is built for this moment. It does not attempt to make orchestration more clever. It makes the environment more controlled. It replaces prompt-level safety with policy-level enforcement, replaces narrative logging with ground truth evidence, and replaces implicit trust with a staged operational model that security and compliance can approve.

Ready for Zero-Trust Agent Security?

Deploy the Agentic Control Plane and implement the Agent Trust Stack for your enterprise.

Get Started