The harness: three ways to secure and govern the operational infrastructure of AI agents
Matching security and governance to teams moving fast on AI agent deployments.
Every time you start up Claude Code, Cowork, or Microsoft Copilot Studio, an agent harness is running on the backend, checking, pulling resources, and orchestrating each step from start to finish behind the scenes.
Similarly, every time you work with a customer service agent, a travel recommendation agent, or any other kind of agent, you are likely benefiting from a harness that determines when and where to escalate to a human, how to converse with a customer, and how to find and digest the information to answer a query.
Today, harnesses form a large portion of the operational infrastructure of AI agents. In this paper, you will learn the three key ways to ensure security and governance accounts for these harnesses, so security teams can move even faster operationalizing agents across their enterprises.
The harness: supporting AI adoption at scale
An LLM on its own cannot act. Connect at least one tool to the model and it can act autonomously, which makes it an agent. Add a layer around that agent, or around several agents, that supports memory, logging, tool retrieval and more, and you have a harness.

The need for a harness: coding agents
In practice, teams have figured effective harness elements to add for coding agents. For example, a short context map so the agent reads only what each task needs, as well as a progress file and git history so the next session picks up where the last one stopped, or additional review steps before code is merged.
The prevalence of harnesses in coding agents was validated with the Claude Code source leak at the end of March 2026, which showed the shipped product was 98.4% operational infrastructure, or harness, and only 1.6% AI decision logic.
Similarly, to support its ‘minion’ coding agents, Stripe has built a harness that tightly integrates its source control, environments, code generation and CI tooling.
For a complex task running autonomously over a longer period, a coding agent without a harness might be susceptible to forgetting what it changed between sessions, re-opening the same files, and drifting from the original task as context evolves and grows.
The need for a harness: customer-facing support agents
Coding agents are not the only use-case for harnesses. Because of the volume of customers and underlying knowledge sources, a support agent might use sub-agents to take a wide first pass over sanctioned content while another re-ranks the results. A separate harness layer beneath that might then split the conversation into separable checks with adjustable escalation modules.
Whether for coding agents, customer support agents, or another type of agent, harnesses are becoming part of the critical infrastructure AI agents form in enterprises.
Security teams can move faster operationalizing AI agents in their enterprises by taking the AI agent harness into account, using three key security and governance perspectives.
The three key security and governance perspectives for the AI agent harness
The harness matters simultaneously for AI agent security and governance from three different angles.

Agentic risk is contextual: First and foremost, securing agent harnesses starts with securing and governing agentic risk, including understanding the importance of context in how agentic risk forms.
Visibility is as important as control: Second, gaining the broad visibility of the harnesses and agents you do and don’t know about, that run across all your platforms, is just as important as the additional set of controls harnesses can offer.
Security and governance should be applied through the harness: Third, the harness is the place to apply security and governance to agents, in order to keep them running at scale, including scoping the tools it can reach, checking what the agent and harness are about to do, and recording what both did.
Following these three perspectives, security and governance can include the AI agent harness and move faster deploying AI adoption at scale.
Agentic risk is contextual
The role of context in how agentic risk forms
Securing agent harnesses starts with securing and governing agentic risk, and most conversations about agentic risk start with the prompt. This makes sense, because the prompt carries the request, the intent, and the constraints, but as the agent works, the prompt becomes one part of a wider operating context where the agent interprets the request, decides what information it needs, plans a sequence of steps, calls tools, takes in responses, updates its context, and decides what to do next. Risk can enter at any point in that chain.
A prompt can be safe in isolation and still lead somewhere risky once it combines with permissions, tool access, retrieved content, memory, user feedback, or business context. Similarly, a tool call can look legitimate on its own while being one step in a sequence that moves the agent outside its intended role. Or a human approval can add useful oversight, and the harness can even trigger it, but to control risk you still need to understand how the agent interpreted the task, what context it has gathered, and what assumptions it carries forward.
Taken holistically, agent risk evolves through the way the agent interprets inputs, plans steps, calls tools, and carries information forward.
For security teams, the practical implication is that agent governance needs to follow the full context of the agent across configuration, context, and activity. Prompt filtering, jailbreak controls, tool permissions, MCP gateways, identity policies, and runtime alerts can all be useful because each layer gives part of the picture.
AI agent risk is fundamentally contextual, forming as agents interpret, plan, call tools, process responses, and carry context forward.
Visibility is as important as control
See the risk across all your agents and harnesses
Now that we know how agent risk forms, the next piece is to gain full, contextual visibility into agent risk, for both the agents and harnesses you know about and the ones you do not. This can be done by gaining visibility of your agents across code, cloud, endpoints and SaaS, as the harness exists with the agent, and is generally determined by platform.

How far does the current security stack go?
EDR and XDR systems can observe execution environments, and API gateways and firewalls can inspect requests at defined boundaries. These controls provide partial coverage, but they were not designed to capture how agents operate. For example, gateway-based controls can inspect individual requests, but they do not follow how those requests relate to one another over the course of a workflow, which means they fail to address how agentic risk forms in the first place.
As agents begin to operate across development environments, SaaS platforms, cloud infrastructure, and external APIs, this fragmentation becomes more visible. Each system may report that activity is valid within its own scope. There is no single view that explains how the full sequence of actions unfolded.
Retrofitting these tools creates the appearance of coverage without resolving the underlying gap. Security teams are left with signals from multiple layers, but without a clear understanding of agent behavior, making it impossible to even know how to configure the harnesses their agents work within.
A prompt and a final answer say almost nothing about how the agent got there: the instructions it was given, the plan it formed, the tools it was allowed to use, the content it retrieved, the responses it absorbed, and the approvals added along the way. You need that contextual visibility continuously, for every agent across cloud, code, endpoint, and SaaS. Without it, placing controls through the harness is close to guesswork.
Continuous discovery and contextual understanding are required for the agents and harnesses you know about, as well as the ones you don’t know about.
Apply security and governance through the harness
With an understanding of how AI agent risk forms, as well as visibility and a contextual understanding of agents and their harnesses, the final step is to use the harness as your control surface, at the points where risk actually forms and where behavior can be shaped.
Doing this includes using a mix of the harness’s own security and governance settings, which generally include the following types of controls and governance tooling:

Allowlists and denylists. Harnesses give admins a way to limit the tools, commands, MCP servers, and destinations an agent may reach. In Claude Code, an administrator can deploy a managed MCP configuration with allow and deny patterns, pin the agent to a fixed approved set, or disable MCP entirely. Copilot Studio uses Power Platform data loss prevention to sort connectors into business, non-business, and blocked groups, and lets admins block specific connectors.
Centrally managed configuration. Some harnesses support managed settings delivered through MDM, OS policy, or an admin console, where the managed layer outranks everything else, including command-line flags.
Interception points and hooks. Some harnesses offer hooks, a programmable checkpoint at tool-call time where a policy can inspect what the agent is about to do and allow, block, or reshape it. In Claude Code, a PreToolUse hook fires before a tool call and can deny it outright, while a PostToolUse hook fires after the call has run and can feed back or replace the result rather than block it. Codex now offers the same PreToolUse and PostToolUse hooks, alongside its sandbox and approval modes. Copilot Studio does not expose a per-call hook to makers in the same way; interception there happens in the surrounding Microsoft governance layer and through connector policy.
Skills. Skills are usually thought of as a way to add capabilities to an agent, and they are just as much a place to apply security and governance as an important part of the larger infrastructure of the agent harness. A skill is a reusable, loadable set of instructions and resources, usually a short instruction file and sometimes a companion script, that the agent pulls in on demand rather than carrying in every prompt. Because a skill defines how a task must be done, you can use it to put your organization’s required procedure in front of the agent at the moment it acts: the review steps a change has to clear, the logging and naming conventions it has to follow, or the exact sequence a regulated workflow must take. In that sense a skill works like a guide built into the harness, steering the agent before it acts.
Logging and audit trails. Harnesses differ sharply in how much they hand you. Copilot Studio rides Microsoft Purview, so administrative and user interactions with agents are audited by default. Claude Code sits a step back, emitting OpenTelemetry metrics, events, and traces you can ship to a SIEM such as Splunk or Datadog, with Anthropic’s Enterprise Compliance API and audit logs covering Claude Code activity like logins, configuration changes, and chats. Coverage varies across the Claude enterprise suite, and as of this writing it is not yet available for Cowork, which runs locally on the endpoint.
Plugins. The Claude ecosystem offers a plugin option, which is essentially a package that distributes skills, hooks, and MCP access, and can be delivered through an MDM.
A note on gateways
A common instinct is to introduce gateway-style controls for AI systems, positioning them as central enforcement points that inspect prompts, evaluate tool calls, and apply policy before actions are executed. This approach assumes that control can sit at a defined boundary and that inspecting requests at that boundary is sufficient. Agentic systems do not fit those assumptions.
Agents do not operate through a single ingress or egress point; they call tools dynamically, move across systems, and reuse context between steps. The practical cost is that inline inspection on every interaction adds latency, and at scale, the small delays accumulate. The agent might also figure out an alternative way to reach information to complete the task, bypassing the external checkpoint completely.
How security and governance can work with the harness: Geordie Beam

Geordie’s Beam works inside the harness to proactively remediate, using many of these available security and governance configurations, to close to where the agent reasons and acts, so it can follow the full contextual path rather than a single slice of it. That is what lets Beam catch the risks that only appear in the sequence, redirecting the next step, stopping data from combining across systems that were never meant to be linked, or fixing an instruction buried in retrieved content that the agent treats as its own. Beam gives both the control surface to apply policy and the vantage point to understand agent behavior, without forcing traffic through a separate, external choke point.
Unlike static controls that agents can bypass, or traditional cyber interventions that risk loss of visibility and harm, Beam shapes agent behavior as agents operate, so the response is specific and adaptive rather than generic, and the controls are yours to define rather than the provider’s defaults. The result is that security teams can take proactive control of the associated risk and build the confidence to keep their AI initiatives moving fast.
Harnesses offer multiple security and governance configuration options; moving faster with AI agent adoption at scale means working through them.
For a deep dive into the capabilities of securing through the harness, and the contextual gap that traditional tooling leaves, Ken Huang’s recent piece on the secure harness is an additional, useful resource.
Conclusion
Harnesses are the real-world infrastructure teams have built that works for running AI agents, so to secure and govern how agents are really deployed you have to include the harness in the picture. As harnesses continue to roll out with agents as critical IT infrastructure, teams need a security and governance approach that prioritizes AI agent performance at scale, offering informed confidence that enables them to move faster operationalizing AI agents.
To see how Geordie can shine a light on your existing AI agent harnesses today, and how Beam’s proactive remediation can fit into your existing harness, get in touch with the team to book a demo today.