Why “Universal Guardrails” for AI Agents Sound Appealing But Rarely Work in Practice

Why guardrails that work inside a single system fall short in real agent workflows, and how risk emerges across systems as agents move, connect, and carry context.

Across security and engineering teams, the same question is being asked with increasing urgency: how do we put guardrails around AI agents?
The concern is well placed. In most security architectures, guardrails are a foundational control. They define boundaries, enforce policy, and provide a reliable way to constrain how systems behave. If a workflow passes through a known control point, guardrails can inspect activity and prevent actions that fall outside acceptable limits.
It is natural to apply that model to agentic systems.
Organisations want a way to ensure that agents operate within clear boundaries, that tool usage is controlled, and that behaviour remains aligned with policy as these systems move into production.
The difficulty is that agents do not operate within a single control plane. Their behaviour extends across tools, systems, and environments, which changes where and how guardrails need to apply. The question is no longer simply how to define guardrails, but how to ensure they reflect how agents actually behave in practice.
A common view is that stronger guardrails at the model or orchestration layer will address most of the risk in agentic systems. That holds within contained environments. Guardrails can constrain tool usage, enforce policy, and reduce obvious failure modes.
The limitation is where they are applied.
This model assumes agent behaviour is fully expressed within a single system. In practice, agents operate across multiple systems, combining context, calling external tools, and carrying decisions forward across steps. The behaviour that matters often emerges outside the point where those guardrails are defined. The issue is not whether guardrails work. It is whether they are positioned where behaviour actually occurs.
Guardrails in Theory and in Practice
Architectures for AI agents are often drawn in a reassuringly simple way.
A central orchestration layer sits in the middle. Prompts enter from one side. Tools sit on the other. Guardrails surround the system to ensure the agent behaves safely.
The model is logical. If every decision passes through a single orchestrator, guardrails can inspect prompts, evaluate tool calls, and enforce policies before actions occur.
Within a contained system, this approach works well.
For example, a coding agent reviewing pull requests inside a development platform may be restricted to reading repository code, running predefined tests, and generating suggested changes. Guardrails in the orchestration layer can ensure the agent only calls approved tools, cannot modify protected branches, and cannot access external services. Because the workflow remains inside the development environment, those controls can reliably govern behaviour.
The gap is not in how these guardrails are implemented. It is in what they are able to see.
They govern a single execution path, while real agent behaviour unfolds across many.
Architecture and Reality
Most enterprise agent systems do not remain inside a single platform.
A workflow that begins in a development environment may extend into SaaS applications, cloud infrastructure, APIs, and external services. Each system enforces its own controls, but the workflow itself spans several environments.
Agents are often described as platform-bound. In practice, they are anything but.
This transition rarely happens all at once. An agent starts with a narrow role and a small set of tools. As it proves useful, capabilities are added. New integrations are introduced. More context is made available to improve performance.
Each step is reasonable. Each change improves utility.
Over time, the agent’s effective scope expands beyond the system where it was originally defined. Context retrieved in one environment influences behaviour in another. Actions taken in one system trigger consequences in the next.
From the perspective of any individual platform, the behaviour appears valid. Across the full workflow, the path becomes harder to reason about.
reads code, config
runs tests, builds
writes, updates, communicates
debugging, summarisation, enrichment
There is no single point of failure. The system evolves from contained to distributed without a clear moment where governance is reconsidered.
Guardrails Do Not Travel
Guardrails are defined within systems. They are not designed to operate across them. Agent workflows, however, routinely cross those boundaries.
Different frameworks implement them in different ways. Some rely on prompt constraints. Others inspect tool calls. Some operate as middleware, while others exist primarily during development.
In a coding environment, an agent running through tools like Claude Code typically relies on prompt-level constraints, repository scoping, and local configuration. Enforcement depends heavily on how the agent interprets instructions in that context.
In a cloud or SaaS environment, such as a Copilot built in Microsoft Copilot Studio, guardrails are enforced through identity, connectors, and predefined action scopes tied to services like Microsoft Graph or internal APIs. Control is strongest within that ecosystem.
Both approaches are effective within their respective boundaries.
The challenge appears when workflows span across them.
An agent may generate output in a development environment that is passed into a SaaS copilot. A cloud-based agent may call external APIs or trigger developer workflows. Each system enforces its own controls, but those controls do not extend across the full sequence of actions.
Even when guardrails are correctly implemented in each system, they remain blind to how decisions connect across them.
The result is fragmented governance. What is constrained in one environment may be unconstrained in the next, even though the workflow is continuous.
Where This Appears in Practice
The gap becomes clearer when looking at how agents operate in real environments.
In development workflows, a coding agent may retrieve code from a repository, run validation checks, trigger a build pipeline, and call an external debugging service. Each action is authorised, yet the workflow extends beyond the environment where guardrails were originally defined. Context from the repository may appear in a debugging request sent to an external service. The request is legitimate. The data flow is not always intended.
In business workflows, an agent handling a customer request may retrieve account data, reference internal documentation, and call an external service to summarise or transform the response. Each step is permitted. If internal context is carried into that external call, sensitive information may leave the organisation without any single control being violated.
These outcomes do not come from a single incorrect action. They emerge from how agents combine context, tools, and decisions across systems.
Where Guardrails Stop Providing Full Coverage
Guardrails remain effective within the environments where they are defined.
They can validate prompts, constrain tool usage, and enforce clear boundaries inside a given system. That remains necessary. The limitation appears when workflows extend beyond those boundaries.
Enterprise agents routinely interact with external APIs, cloud services, and specialised tools. Once that happens, no single control point governs the entire sequence of actions.
The guardrails continue to function locally. They simply do not capture how the workflow unfolds across systems. From a security perspective, the system appears controlled while behaviour remains only partially understood.
The question becomes less about whether an action was allowed, and more about how a series of allowed actions produced an outcome.
Governance Must Follow the Workflow
As agents move into real enterprise use, governance needs to reflect how they actually operate.
Agents act across development environments, SaaS platforms, cloud infrastructure, and external services. Each environment enforces its own controls, but the behaviour that matters emerges across them.
Security teams need to understand how decisions unfold step by step, which tools are invoked, how context moves between systems, and where authority is exercised.
Without that, organisations are governing architecture diagrams rather than operational behaviour.
The Next Phase of Agentic Governance
Early approaches to agent governance focused on model safety and prompt control. That reflected how these systems were first introduced: contained environments with clear boundaries.
That context no longer holds.
Agents operate across tools, systems, and services, carrying context forward and making decisions over time. The behaviour that matters emerges across those interactions, not within any single component. The boundary of control has moved. It no longer sits inside a single system, and it cannot be enforced from a single point.
Improving guardrails within individual platforms remains important. It strengthens local control and reduces obvious failure modes. It does not provide a complete view of how agents operate once workflows extend beyond those boundaries.
Governance needs to follow how agents actually behave: across systems, across tools, and across time. That requires visibility into decision paths, context usage, and how outcomes are produced in practice.
This is where a different class of control becomes necessary. One that can observe and interpret behaviour across environments, where those decisions actually occur.
More Articles



