Please complete this form for your free AI risk assessment.

Agent Hijacking

Last updated on Dec 17, 2025

Agent Hijacking is a security attack where an adversary manipulates an autonomous AI agent’s context, instructions, memory, or tool usage to redirect its behavior toward unauthorized or harmful actions.

Unlike model hijacking, which targets the underlying AI model, agent hijacking exploits the runtime control plane of agentic AI applications where reasoning, planning, tools, and memory converge.

Why Agent Hijacking Is Dangerous

Agent hijacking is not a prompt problem. It is a runtime security problem that emerges as soon as AI systems are trusted to act on your behalf.

A hijacked agent can:

Perform actions on behalf of users without authorization
Abuse legitimate tools, APIs, or credentials
Persist malicious instructions across sessions
Make decisions that appear reasonable but violate policy
Operate silently without triggering traditional alerts

Because each individual step may look valid, agent hijacking often evades detection until damage is already done.

How Agent Hijacking Works

Agent hijacking typically occurs through manipulation of one or more runtime components:

Context Injection
Attackers insert malicious instructions into prompts, documents, emails, chats, or web content that the agent treats as trusted context.

Tool Redirection
The agent is coerced into misusing legitimate tools or APIs, often chaining actions in unintended ways.

Memory Manipulation
Short-term or long-term agent memory is poisoned, causing persistent behavioral changes.

Objective Drift
The agent is nudged to reinterpret its goal in a way that technically satisfies instructions while violating intent.

Agent-to-Agent Influence
In multi-agent systems, one compromised agent manipulates others to amplify impact.

Real-World Example

In one documented scenario, Straiker researchers uncovered a zero-click agentic AI attack where a single inbound email caused an AI agent to silently exfiltrate sensitive Google Drive data without user interaction, malware, or explicit prompts.

The agent behaved as designed: reading context, using trusted tools, and completing tasks autonomously. However, attackers exploited this trust boundary to hijack the agent’s behavior, turning routine automation into a covert data-leak channel.

‍Agent Hijacking vs Model Hijacking

Model Hijacking	Agent Hijacking
Targets the underlying AI model	Targets agent behavior at runtime
Alters model responses	Alters decisions and real-world actions
Primarily prompt-level manipulation	Context, tool, memory, and workflow manipulation
Typically stateless	Persistent and stateful across sessions
Limited blast radius	Real-world operational and data impact