Securing AI Agents Means Following Behavior, Not Boundaries
A practical view of agent security from our work on OWASP's latest guidance.
Most agent security strategies still start at the edge. Put a gateway in front of the agent, inspect what crosses the boundary, and the risk is meant to be contained. It is an easy story to sell but does not match how agents behave once they are running in production.
OWASP’s GenAI Security Project has just published version 2 of its State of Agentic AI Security and Governance, and we were pleased to contribute to the report through Geordie’s technical review. It is a strong resource for CISOs and security practitioners, particularly because it moves the conversation beyond boundary controls and into how agents are configured, monitored, and governed once they are running.
The threats stopped being hypothetical
The first edition of this report, in July 2025, framed agentic risk as a portfolio of plausible threats. A year later, almost every entry on that list has production incidents, vendor advisories, and CVEs attached to it, from EchoLeak and ForcedLeak to the ShareLeak chain and a run of misconfigured Copilot Studio agents shipped without authentication. This shift is what makes this edition actionable, it is informed by real-world incidents affecting deployed systems so the guidance reads as concrete and evidence-led rather than speculative.
Why the gateway story is so tempting
A gateway inspects a boundary, but agent usage does not respect one.
The appeal of a gateway is obvious. It is a single place to put in relevant controls, a single thing to purchase and manage. The challenge is that agents do not stay neatly on one side of it.
The report makes this concrete. Data that would be blocked at an API boundary, it notes, “may flow freely through observability infrastructure”, because logs, telemetry pipelines, and generated outputs are all exfiltration vectors that sit outside the gateway’s field of view. The same blind spot shows up with delegated access. When an AI vendor holding broad OAuth scopes is compromised, an attacker can operate inside the trust boundary using legitimately granted permissions. No policy is technically violated, so controls such as Multi-Factor Authentication (MFA), conditional access, and most DLP do not fire.
Even the controls that do sit at the boundary come with caveats, and the report is honest about them. Deterministic hooks and prompt-layer guardrails, it observes, “function more reliably as an early warning layer than as a hard security boundary.” A gateway tells you that something has breached a defined control. It struggles however to tell you whether the agent that crossed it should have, or what it did next.
The report’s key contribution here is to say that the boundary is a useful layer but not the only piece of the puzzle.
Configuration is context
The report’s central move is to put safety and security in the same place: the deployment layer.
The deployment layer is how an agent is configured: the architecture it runs in, the permissions it is granted, and the operational context in which it operates. Least-agency principles, so an agent holds only the permissions it genuinely needs. Strict separation between a user’s permissions and the agent’s capabilities. Hardened tool registries. Output filtering on the paths data can leave by. This is the part most teams already know how to do, because it looks like the security hygiene they have always practised.
Configuration matters because it defines the deployment context: what the agent can touch, which tools it can reach, and where its data is allowed to go. That context is what makes runtime risk legible. A runtime signal only means something if you know what the agent was configured to do, because the same action can be routine for one agent and a clear breach for another. Get the configuration right and you have the reference point runtime monitoring needs to tell ordinary behaviour from divergence.
The catch is that this context is not statically and must be regularly refreshed and enforced. “Pre-deployment certification loses value the moment an agent begins, accumulates context, loads tools dynamically, or modifies its own configuration.” The picture you set at assessment time drifts as the agent is used, so the deployment context has to be maintained as a live baseline rather than certified once and trusted from then on.
Behavior is where the risk actually sits
Configuration sets the guardrails. Runtime is what identifies when the agent goes off piste.
This is where the report provides the most value. Securing a live agent is a behavioral problem, and it calls for behavioral controls. The report sets out what good looks like:
- Real-time behavioral monitoring that flags when an agent’s actions diverge from its approved workflow. Plan-divergence detection, which compares what the agent is doing against what it said it would do, is emerging as a core pattern across both the OWASP Agentic Top 10 and CoSAI’s Secure-by-Design principles.
- Consequence-aware authorisation that evaluates what an agent is doing in the moment, rather than letting it quietly inherit whatever its human operator happens to be allowed to do.
- Fast containment, meaning kill-switches and automated incident routing that act in seconds. Regulations such as DORA and NIS2 have very stringent report windows (sub-24hrs) for incidents and were written on the assumption that you are watching the agentic system continuously rather than auditing periodically.
A configured agent is one you have made assumptions about. A monitored agent is one you can actually see.
The report also has a great take about scale, which is part of why it is practical and useful. Human review does not stretch to an agent making thousands of decisions an hour, and routing every action to a person just trades a security problem for decision fatigue. That points to a clear solution. Lean on behavioral observability and runtime controls so the system itself surfaces divergence and risks, rather than relying on human operators
Where to start?
The report offers a practical starting point for practitioners. Identify the most advanced agents you are already running, then either raise governance to match them or scale the deployment back to something you feel comfortable governing today. The report encourages teams to find the agents they are using early, and gives them a shared vocabulary and a maturity model to do it with. The document also does a grand job of articulating the attack surface as well as capturing a wide range of useful information in the extensive appendices
The most exciting takeaway is that none of this is speculative anymore. The threats are documented, and so are the controls that answer them. The opportunity the report highlights is largely one of emphasis. Pair solid configuration with real visibility into how agents behave once they run, and you can move quickly while staying in control.
Get the configuration right, watch what the agent actually does, and securing agentic AI becomes something you can get ahead of. State of Agentic AI Security and Governance v2 is a great resource for making that happen, and well worth a read for CISOs and security practitioners alike. I hope you enjoy it!