Please complete this form for your free AI risk assessment.

Back to all posts

Assume Unexpected Behavior: The Five Eyes Guidance Is an Architecture Decision

Share this on:

Written by

Venkata Sai Kishore Modalavalasa

Published on

May 13, 2026

Read time:

3 min

Why the Five Eyes guidance on agentic AI changes security architecture: privilege, behavioral, structural, and accountability risks demand runtime visibility, adversarial testing, and guardrails.

☼ / ☾

Loading audio player...

contents

On May 1, “Five Eyes” cyber agencies (CISA, NSA, ASD ACSC, the Canadian Centre for Cyber Security, NCSC-NZ, and NCSC-UK) published Careful Adoption of Agentic AI Services. It's the first time the alliance has issued coordinated guidance on a single AI attack surface. Most coverage so far has read it as a slowdown advisory: deploy incrementally, restrict scope, keep humans in the loop.

I read it differently. As someone who's been contributing to the OWASP Top 10 for Agentic Applications, which the guidance directly references, I see this document doing something more useful. It hands the industry a shared vocabulary for a class of risk most architectures weren't designed to handle. And it crystallizes a design principle worth tattooing onto every agent rollout: assume agentic AI may behave unexpectedly.

That sentence sounds like a warning. It's actually a constraint. It tells you how to build.

A design constraint, not a caveat

Distributed-systems engineers don't trust a remote service to be available. We design for partial failure, retries, idempotency, circuit breakers. We assume the network is hostile and the dependency is down. The Five Eyes guidance is asking the same discipline of agentic AI: assume the agent will be wrong, manipulated, or hijacked, and architect so that those outcomes are recoverable rather than catastrophic.

That reframing matters because the alternative is what most enterprises are doing right now. Granting agents broad credentials, wiring them into production systems, and trusting them to behave because the demo worked. A 10,000-employee company that historically ran 30 to 100 applications is now running tens of thousands of effective AI applications once you count every LLM call that generates a script or invokes a tool. Without visibility into that surface, you're shooting in the dark.

Five risks of Agentic AI, and what we see in the field

The guidance breaks agentic AI risk into five categories. None of them are theoretical. Every one of them shows up in our research or in attacks we've already analyzed.

‍Privilege risk. When an agent receives broad access, a single compromise spans every system the agent touches. The guidance warns specifically against the procurement-agent pattern: one identity with credentials for finance, email, and contracts. We've seen this play out at the supply-chain layer. Earlier this year, our research team documented an MCP ecosystem attack where threat actors cloned an Oura Ring MCP server and used fake GitHub accounts to push it to developers. The malicious server inherited every privilege the legitimate one would have had. The agent didn't get exploited. The tool it trusted did. ‍
Design and configuration risk. Poor architectural choices create gaps before the system goes live. Across the agentic applications we've tested, 75% were vulnerable to direct or indirect prompt injection. These aren't subtle findings. They're hardcoded system prompts that can be overridden, tool allow-lists that aren't enforced server-side, and integrations that ship with debug endpoints exposed. The guidance recommends threat-modeling with OWASP and MITRE ATLAS before deployment. That recommendation exists because design-time problems become production incidents at scale.‍
Behavioral risk. The agent pursues a goal in ways its designers never anticipated. The cleanest example I can point to is the Perplexity Comet incident our STAR Labs team analyzed: a zero-click Google Drive wipe triggered by a single email containing indirect prompt injection. Comet did exactly what an agent is supposed to do. It parsed the input, planned the action, and executed it. The problem was that the input came from an attacker, not the user. The behavior wasn't a bug. It was the agent being goal-driven and losing context, which is what these systems do when you don't constrain them. ‍
Structural risk. Interconnected agents can cascade failures that no single component would have produced. NomShub is the example I keep coming back to: a vulnerability chain in the Cursor AI code editor where a malicious repository combines indirect prompt injection, a sandbox escape through shell builtins, and Cursor's own remote-tunnel feature to give an attacker persistent shell access. No individual component is broken. The chain is the breach. Multi-agent architectures and MCP-based integrations almost guarantee more of this, because every new protocol adds a seam. ‍
Accountability risk. Agentic decisions are hard to inspect and the logs are hard to parse. The guidance flags this as one of the harder problems, and it is. When an incident involves an agent that reasoned across five tool calls, three model invocations, and two retrieved documents, traditional logging gives you fragments. You need the full causal chain: what the agent was told, what it decided, what it called, what came back, and why the next step happened. We showed this concretely with a GitHub MCP hijack where the only way to detect private-repo data exfiltration was reconstructing the agent's full narrative across steps. Without that, post-incident review degenerates into guesswork.

What “careful” looks like when you can’t pause

The honest tension with this guidance is that "deploy incrementally" collides with reality. 85% of developers are using AI coding tools. Claude Code is committing to repositories. Microsoft 365 Copilot is updating CRM records. The agents are inside. You can't unship them.

What you can do is operationalize the guidance against the surface you actually have:

Get visibility first. You can't secure what you can't see. Inventory the agents, including the ones SaaS vendors quietly added to products you already own, and map their scopes, tools, and data access. This is the visibility problem our discovery and posture management capability was built to solve.
Test adversarially before production. Red-team every agent against the OWASP Top 10 for Agentic Applications. Findings at design time are infinitely cheaper than findings at runtime. This is what we built Ascend AI to do continuously.
Run guardrails at runtime. The guidance is right that prompt injection and tool misuse can't be fully prevented at the model layer. You need behavioral detection that watches the agent's actions in addition to prompts. That's the gap Defend AI fills.
Capture the full causal chain. Reconstructing what the agent did, why, and through which tool is the accountability requirement the guidance describes. Build it in before you need it for an incident review. Straiker addresses this across the platform, so visibility, adversarial testing, and runtime enforcement all contribute to the same forensic record.

Get ready to build for the assumption

The Five Eyes guidance isn't telling the industry to slow down. It's telling the industry to grow up. For the first time, five governments have aligned on how to describe the problem: privilege, design, behavior, structure, and accountability. More importantly, they’ve tied those concepts to the frameworks organizations will actually need to operationalize them.

Agentic AI is going to keep shipping, with or without security teams ready for it. The architectures that survive will be the ones built on a single assumption: the agent may behave unexpectedly. Design accordingly.

If you're shipping agents into production, we should talk. Straiker helps security and engineering teams discover every agent in their environment, red-team them against OWASP and ATLAS before release, and run behavioral guardrails at runtime. Get in touch to see it in action.

‍

Sai is Chief Architect at Straiker, where he focuses on the security architecture of AI agents and autonomous systems. He is also an active contributor to the OWASP GenAI Project, helping shape emerging guidance around secure AI system design, runtime behavior, and agentic risk. His work is grounded in the practical realities of how AI systems are built, deployed, and operated at scale.

No items found.

That sentence sounds like a warning. It's actually a constraint. It tells you how to build.

A design constraint, not a caveat

Five risks of Agentic AI, and what we see in the field

The guidance breaks agentic AI risk into five categories. None of them are theoretical. Every one of them shows up in our research or in attacks we've already analyzed.

‍Privilege risk. When an agent receives broad access, a single compromise spans every system the agent touches. The guidance warns specifically against the procurement-agent pattern: one identity with credentials for finance, email, and contracts. We've seen this play out at the supply-chain layer. Earlier this year, our research team documented an MCP ecosystem attack where threat actors cloned an Oura Ring MCP server and used fake GitHub accounts to push it to developers. The malicious server inherited every privilege the legitimate one would have had. The agent didn't get exploited. The tool it trusted did. ‍
Design and configuration risk. Poor architectural choices create gaps before the system goes live. Across the agentic applications we've tested, 75% were vulnerable to direct or indirect prompt injection. These aren't subtle findings. They're hardcoded system prompts that can be overridden, tool allow-lists that aren't enforced server-side, and integrations that ship with debug endpoints exposed. The guidance recommends threat-modeling with OWASP and MITRE ATLAS before deployment. That recommendation exists because design-time problems become production incidents at scale.‍
Behavioral risk. The agent pursues a goal in ways its designers never anticipated. The cleanest example I can point to is the Perplexity Comet incident our STAR Labs team analyzed: a zero-click Google Drive wipe triggered by a single email containing indirect prompt injection. Comet did exactly what an agent is supposed to do. It parsed the input, planned the action, and executed it. The problem was that the input came from an attacker, not the user. The behavior wasn't a bug. It was the agent being goal-driven and losing context, which is what these systems do when you don't constrain them. ‍
Structural risk. Interconnected agents can cascade failures that no single component would have produced. NomShub is the example I keep coming back to: a vulnerability chain in the Cursor AI code editor where a malicious repository combines indirect prompt injection, a sandbox escape through shell builtins, and Cursor's own remote-tunnel feature to give an attacker persistent shell access. No individual component is broken. The chain is the breach. Multi-agent architectures and MCP-based integrations almost guarantee more of this, because every new protocol adds a seam. ‍
Accountability risk. Agentic decisions are hard to inspect and the logs are hard to parse. The guidance flags this as one of the harder problems, and it is. When an incident involves an agent that reasoned across five tool calls, three model invocations, and two retrieved documents, traditional logging gives you fragments. You need the full causal chain: what the agent was told, what it decided, what it called, what came back, and why the next step happened. We showed this concretely with a GitHub MCP hijack where the only way to detect private-repo data exfiltration was reconstructing the agent's full narrative across steps. Without that, post-incident review degenerates into guesswork.

What “careful” looks like when you can’t pause

What you can do is operationalize the guidance against the surface you actually have:

Get visibility first. You can't secure what you can't see. Inventory the agents, including the ones SaaS vendors quietly added to products you already own, and map their scopes, tools, and data access. This is the visibility problem our discovery and posture management capability was built to solve.
Test adversarially before production. Red-team every agent against the OWASP Top 10 for Agentic Applications. Findings at design time are infinitely cheaper than findings at runtime. This is what we built Ascend AI to do continuously.
Run guardrails at runtime. The guidance is right that prompt injection and tool misuse can't be fully prevented at the model layer. You need behavioral detection that watches the agent's actions in addition to prompts. That's the gap Defend AI fills.
Capture the full causal chain. Reconstructing what the agent did, why, and through which tool is the accountability requirement the guidance describes. Build it in before you need it for an incident review. Straiker addresses this across the platform, so visibility, adversarial testing, and runtime enforcement all contribute to the same forensic record.

Get ready to build for the assumption

‍

No items found.

Share this on:

similar resources

Why Microsoft's Agentic AI Expansion Proves Runtime Guardrails Are Now Mission-Critical

Blog

December 11, 2025

Why Microsoft's Agentic AI Expansion Proves Runtime Guardrails Are Now Mission-Critical

Microsoft's Security Copilot agents mark agentic AI's shift to production. With 80% of orgs reporting risky behaviors, runtime guardrails are now essential for secure deployment.

How Your AI Chatbot Is Your New Supply Chain Weak Link

Blog

September 3, 2025

How Your AI Chatbot Is Your New Supply Chain Weak Link

The Salesloft Drift breach shows how quickly an AI chatbot can become an enterprise-wide supply chain risk when integrated to critical business systems like Salesforce, and why AI-native security has to be built in from the start.

Blog

November 25, 2025

Your Guide to Must-Attend AI and Cybersecurity Conferences in 2025-2026

Discover the top AI and cybersecurity conferences for 2025-2026. From RSA and Black Hat to NeurIPS and AI Summit NYC—your essential guide to industry events.