Please complete this form for your free AI risk assessment.

Back to all posts

Blog

AI Agent Security Needs More Than Guardrails

Share this on:

Written by

Chris Sheehan

Published on

May 20, 2026

Read time:

3 min

Guardrails shape what an agent says. They don’t control what it does. The tool calls, the data reads, the actions taken on your credentials: that’s where the risk actually lives.

☼ / ☾

Loading audio player...

contents

The short version: Most enterprise “AI security” is pre-runtime safety. Prompt engineering, output filtering, content moderation. These controls sit before the model generates and after it responds. Agents don’t live there. They live at runtime: calling tools, fetching documents, querying systems, chaining decisions step by step. That gap is commonly uninstrumented in enterprise stacks, and it’s where breaches will land first.

There’s a comfortable assumption settling into the enterprise right now. “We added guardrails, so we’re covered.” It’s wrong. And it’s the assumption that will get the first wave of agentic deployments breached.

AI agent guardrails can reduce unsafe outputs, but they do not control runtime behavior. They shape what an agent says, not what it does. That is where objective drift begins: the agent stays inside its permissions while its actions move away from the user’s intent, the application’s purpose, or the organization’s security policy. The real risk lives in the tool calls, data access, and actions taken on your credentials.

The Gap No One Owns

The AI stack has a structural blind spot. Model providers focus on safe outputs. AI teams optimize for speed and usability. Security teams assume their existing controls extend into agent execution. In practice, those controls rarely extend into the agent execution loop.

Most of what gets called “AI security” today is pre-runtime safety. Prompt engineering. Output filtering. Static policies. Content moderation. All of it sits before the model generates or after it responds. Agents don’t live in either of those places. They live at runtime, calling tools, fetching documents, querying systems, chaining decisions step by step. That’s where things break.

The risk sits between retrieval and reasoning, where context gets poisoned. And between reasoning and action, where decisions go off course. Nobody in the current stack owns that gap.

A Failure Mode You Can Picture

A coding agent is wired into your environment. It reads documentation, pulls from internal repos, executes tasks on a developer’s behalf. Now introduce one subtle condition: a README, a code comment, or an external page the agent retrieves contains a malicious instruction embedded in the context.

The agent may not flag the instruction as malicious because it appears inside ordinary task context. The instruction gets interpreted as valid, the action gets taken, and sensitive code or data moves somewhere it shouldn’t. Nothing has to look like a jailbreak or a traditional policy violation for the agent to take the wrong action.

This is indirect prompt injection. It’s a form of prompt injections, which is the first entry on the OWASP Top 10 for LLM and GenAI Applications.

EchoLeak (CVSS 9.3) exfiltrated SharePoint and Teams data through M365 Copilot using exactly this pattern. CVE-2025-53773 (CVSS 9.6) achieved remote code execution via a prompt injection hidden in a GitHub PR description. These incidents illustrate why CASB, EDR, DLP, and IAM controls can miss agentic failures when every individual action appears authorized. Every action the agent took was authorized.

The top risks of deploying agents at scale include:

Indirect prompt injection
Tool misuse
Excessive agency
Memory poisoning
Multi-hop information leakage

None of these vectors show up at the network layer your CASB inspects, the process layer your EDR watches, or the IAM layer your cloud posture tool scores. They happen at the semantic layer. That layer spans prompt, retrieved context, tool call, and agent action. That layer is uninstrumented in every enterprise stack I’ve walked through this year.

Visibility Is Not Control in AI

Logs, traces, and usage dashboards are necessary for understanding agent behavior, but they do not stop unsafe actions before execution. Detection without enforcement is hindsight with a UI.

This matters because most “AI security” products on the market today are observability tools marketed as security. They show you what your agents did. They don’t stop what your agents shouldn’t do. In security, the difference between a dashboard and a control is the difference between a postmortem and a block.

What Runtime Control Actually Looks Like

Agent security has to move into runtime. Runtime control exists to catch drift while there is still time to intervene: before the tool call fires, before the data leaves, before a benign-looking chain becomes an incident.

Inspect every tool call before it fires. The agent wants to call a function, read a file, hit an API. Validate it against policy in real time.
Validate every data access against intent. Just because an agent can read something doesn’t mean it should, given what it’s currently trying to do.
Control how retrieved input influences action. Content from external sources has to be treated as untrusted data, not instructions. This is exactly where indirect prompt injection lives.
Govern multi-step decisions as they happen. An agent’s tenth action in a chain might only be dangerous in the context of the first nine. Stateless inspection misses that completely.

And it has to be fast. If enforcement adds meaningful latency, the AI team rips it out. Slow agents don’t ship. The teams getting this right run runtime enforcement as a lightweight control plane inline with the agent, at production speed.

The Shift for AI Agents is Runtime Security

The most important control in your AI stack is not your model, your guardrails, or your dashboards. It’s runtime security for agents.

As autonomy increases, risk moves from generation to execution. The security architecture has to move with it. The defender’s mental model needs to update: it’s not about a hostile model versus a cooperative user anymore. With context poisoning, the model is cooperative. It’s the context that’s been weaponized. Standard guardrails that inspect model outputs miss this entirely, because the outputs look benign. They look like things a reasonable developer would approve.

Practical Skills for Defenders

Immediate hygiene if you have agents in production:

Audit any file your agents ingest from external sources. CLAUDE.md, README, PR descriptions, code comments. These are injection surfaces.
Treat MCP servers the same way you treat npm dependencies. Vet them. Pin them. Monitor for changes.
Limit agent session length for sensitive work to reduce the compaction attack window.
Avoid broad permission rules. Bash(git:*) is an invitation.
Instrument the semantic layer. If you can’t see what an agent is reasoning about before it acts, you cannot secure it.
Classify your agents by autonomy level and apply proportional controls. A summarization bot and a coding agent with terminal access are not the same threat model.

How Straiker Helps

The attack surface described above isn’t a single-vendor problem. It’s a systemic vulnerability of agentic AI at large. The same execution loops, permission chains, and tool interfaces exist across every enterprise agent deployment. Straiker is purpose-built for this surface.

For teams that want visibility into what’s actually running, Discover AI maps every agent and MCP server across your environment. For proactive testing, Ascend AI continuously red-teams your agents the way real attackers will. For runtime coverage, Defend AI inspects every prompt and tool call as it happens, at the semantic layer of prompt, tool call, and action.

Straiker is a gold sponsor of the OWASP AI Exchange and OWASP GenAI Project. Our team actively contributes to the Top 10 for Agentic AI and AIVSS.

The tooling exists. Guardrails are a start. Runtime security is what keeps AI agents aligned with the task, the policy, and the enterprise boundary as they act. The question is whether you’re using it before someone else maps your attack surface for you.

If your agent can take action, your security has to follow it step by step. Guardrails are a start. They’re not where the problem ends. They’re where it begins.

No items found.

The short version: Most enterprise “AI security” is pre-runtime safety. Prompt engineering, output filtering, content moderation. These controls sit before the model generates and after it responds. Agents don’t live there. They live at runtime: calling tools, fetching documents, querying systems, chaining decisions step by step. That gap is commonly uninstrumented in enterprise stacks, and it’s where breaches will land first.

The Gap No One Owns

The risk sits between retrieval and reasoning, where context gets poisoned. And between reasoning and action, where decisions go off course. Nobody in the current stack owns that gap.

A Failure Mode You Can Picture

This is indirect prompt injection. It’s a form of prompt injections, which is the first entry on the OWASP Top 10 for LLM and GenAI Applications.

EchoLeak (CVSS 9.3) exfiltrated SharePoint and Teams data through M365 Copilot using exactly this pattern. CVE-2025-53773 (CVSS 9.6) achieved remote code execution via a prompt injection hidden in a GitHub PR description. These incidents illustrate why CASB, EDR, DLP, and IAM controls can miss agentic failures when every individual action appears authorized. Every action the agent took was authorized.

The top risks of deploying agents at scale include:

Indirect prompt injection
Tool misuse
Excessive agency
Memory poisoning
Multi-hop information leakage

Visibility Is Not Control in AI

Logs, traces, and usage dashboards are necessary for understanding agent behavior, but they do not stop unsafe actions before execution. Detection without enforcement is hindsight with a UI.

What Runtime Control Actually Looks Like

Inspect every tool call before it fires. The agent wants to call a function, read a file, hit an API. Validate it against policy in real time.
Validate every data access against intent. Just because an agent can read something doesn’t mean it should, given what it’s currently trying to do.
Control how retrieved input influences action. Content from external sources has to be treated as untrusted data, not instructions. This is exactly where indirect prompt injection lives.
Govern multi-step decisions as they happen. An agent’s tenth action in a chain might only be dangerous in the context of the first nine. Stateless inspection misses that completely.

The Shift for AI Agents is Runtime Security

The most important control in your AI stack is not your model, your guardrails, or your dashboards. It’s runtime security for agents.

Practical Skills for Defenders

Immediate hygiene if you have agents in production:

Audit any file your agents ingest from external sources. CLAUDE.md, README, PR descriptions, code comments. These are injection surfaces.
Treat MCP servers the same way you treat npm dependencies. Vet them. Pin them. Monitor for changes.
Limit agent session length for sensitive work to reduce the compaction attack window.
Avoid broad permission rules. Bash(git:*) is an invitation.
Instrument the semantic layer. If you can’t see what an agent is reasoning about before it acts, you cannot secure it.
Classify your agents by autonomy level and apply proportional controls. A summarization bot and a coding agent with terminal access are not the same threat model.

How Straiker Helps

Straiker is a gold sponsor of the OWASP AI Exchange and OWASP GenAI Project. Our team actively contributes to the Top 10 for Agentic AI and AIVSS.

If your agent can take action, your security has to follow it step by step. Guardrails are a start. They’re not where the problem ends. They’re where it begins.

No items found.

Share this on:

similar resources

What’s the Difference Between Generative AI and Agentic AI?

Blog

October 14, 2025

What’s the Difference Between Generative AI and Agentic AI?

Understand the next evolution of artificial intelligence. Learn how generative AI differs from agentic AI—how agents move beyond content creation to autonomous reasoning and action—and why this shift defines the future of enterprise AI.

Blog

April 15, 2026

Claude Mythos Proves the AI-Persistent Threat Era Has Arrived

Anthropic couldn't safely release their most powerful model. That's not just a safety story. It's an adversarial roadmap. Claude Mythos is a step function in AI capability, for defenders and adversaries alike. Model hardening alone doesn't close that gap.

Blog

April 28, 2026

Why Pattern-Based AI Security Fails Against Agentic Attacks

Pattern-based AI security filters miss encoded instructions, emoji-based bypasses, and multi-step hijacks — the attacks most commonly used against AI agents today. Semantic detection catches all of them.