When Prompt Injection Becomes Remote Code Execution

Prompt injection is usually discussed as a text-level attack — tricking an LLM into saying something it shouldn't. Four new CVEs in CrewAI demonstrate that when agents have tools, prompt injection becomes a vehicle for remote code execution on the host system.

Start for free → | Why your AI agent needs DLP →

Most conversations about prompt injection focus on the LLM itself. An attacker crafts input that overrides system instructions, and the model does something its operator didn't intend — leaks a system prompt, ignores a guardrail, says something off-brand.

That framing is incomplete. When an LLM is embedded in an agent framework with access to tools — code interpreters, file loaders, search APIs — prompt injection doesn't just change what the model says. It changes what the model does. And what it does runs on your system.

On April 1, 2026, CERT/CC published VU#221883: four CVEs in CrewAI, one of the most widely-used AI agent frameworks. The vulnerabilities are individually straightforward. Chained together via prompt injection, they produce a complete attack path from untrusted input to remote code execution on the host.

This is why we built mistaike.

The Four CVEs

CVE-2026-2275: The Sandbox That Isn't One

CrewAI's Code Interpreter tool is designed to execute agent-generated code inside a Docker container. If Docker isn't available — not installed, not running, or the daemon isn't reachable — the tool silently falls back to SandboxPython.

SandboxPython sounds safe. It isn't. The critical issue is that it permits ctypes — Python's foreign function interface, which lets code load and call functions from any shared library on the system.

In practice, this means:

from ctypes import cdll
libc = cdll.LoadLibrary("libc.so.6")
libc.system(b"id")

That executes a shell command with the privileges of the agent process. The attacker doesn't need to escape a container. There's no container to escape from. They can call execve, fork, socket, or any other libc function directly.

A "sandbox" that allows arbitrary ctypes calls is not a sandbox. It's a polite suggestion. The attacker's code runs on the host with full access to the process's memory, file descriptors, and network stack.

CVE-2026-2287: Silent Sandbox Degradation

This is the enabling condition for CVE-2026-2275. CrewAI checks whether Docker is available at agent initialisation — but it doesn't re-verify the check at execution time.

The consequence: if Docker becomes unavailable after the agent starts — the daemon crashes, the socket becomes unreachable, the container runtime is killed — the Code Interpreter continues to accept execution requests, but silently routes them through SandboxPython instead of Docker.

No exception. No log warning. No error to the operator. The agent carries on as if sandboxing is working.

This is a TOCTOU (time-of-check/time-of-use) class of failure. The state verified at initialisation is not the state present at execution. In containerised environments, cloud-hosted agents, and CI pipelines, Docker availability is often conditional — and an attacker who can influence the environment (or simply wait for an intermittent failure) can force the fallback.

The operator's dashboard shows the agent running. It is. Just without the isolation they think they have.

CVE-2026-2286: SSRF via RAG Search Tools

CrewAI's RAG search tools accept arbitrary URLs at runtime without validation. The attack vector is not a crafted HTTP request or a malicious API parameter — it's conversational input to the agent. The injected prompt directs the LLM to search a specific URL, and the tool fetches it.

The target isn't a public website. It's the cloud metadata endpoint.

On AWS, the instance metadata service is reachable from any running workload at 169.254.169.254. A request to http://169.254.169.254/latest/meta-data/iam/security-credentials/ returns the name of the attached IAM role. A second request to http://169.254.169.254/latest/meta-data/iam/security-credentials/{role-name} returns a JSON blob with a live AccessKeyId, SecretAccessKey, and session Token — valid AWS credentials, rotated automatically, with whatever permissions the instance role carries.

GCP exposes similar data at http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token. Azure at http://169.254.169.254/metadata/identity/oauth2/token.

An attacker who can get the agent to fetch one of these URLs has extracted cloud credentials. Those credentials can enumerate S3 buckets, describe EC2 instances, read Secrets Manager entries, or call any other AWS service the role is permitted to reach — without ever touching the host's filesystem or network directly.

The RAG tool returns the response content to the agent's context window. From there, it may appear in the agent's output to the user, be passed to another tool, or be logged. Any of these paths exfiltrates the credentials.

CVE-2026-2285: Arbitrary File Read via JSON Loader

CrewAI's JSON loader tool reads files from disk using paths constructed at runtime — and performs no path normalisation or validation before opening them.

Without a call to os.path.abspath() followed by a prefix check against an allowed directory, any path the agent constructs is valid input to the loader. Including relative paths with traversal components.

The most directly valuable targets:

/proc/self/environ — the running process's environment variables. If the application loaded a .env file at startup, all of those keys — DATABASE_URL, OPENAI_API_KEY, SECRET_KEY, STRIPE_SECRET_KEY — are present here, in plain text, as environment variables accessible from the process's own /proc entry.
.env files — if the agent process's working directory is the application root (common in development and many container configurations), a relative path like ../../.env traverses to the application's dotenv file.
~/.ssh/id_rsa, ~/.ssh/id_ed25519 — private keys. If the agent process runs as a user with an SSH keypair, the attacker can read it.
/etc/shadow (if running as root), AWS credential files at ~/.aws/credentials, kubectl config at ~/.kube/config.

The contents are returned to the agent's context window. The agent doesn't need to exfiltrate them explicitly — they'll appear in its next response, get passed to another tool, or be written to a log.

How the Chain Works

None of these vulnerabilities require sophisticated exploitation individually. The attack path is:

Prompt injection. Attacker-controlled text reaches the agent's context — via direct user input, RAG retrieval from an external source, a tool response from a compromised third-party service, or any other channel the agent reads from.
Tool invocation. The injected text instructs the agent to use its tools: execute this code, load this file, search this URL. The LLM processes injected instructions the same way it processes legitimate ones — it has no reliable way to distinguish them. The tool call is made.
Sandbox escape. The Code Interpreter checks Docker availability. If Docker is down (or was never running), it falls back to SandboxPython. The attacker's code runs with ctypes access — arbitrary C function calls on the host.
Credential harvesting and lateral movement. Parallel to the code execution path: the SSRF tool fetches cloud metadata credentials; the JSON loader reads /proc/self/environ and .env. The attacker now has host RCE, live cloud credentials, and application secrets. They have everything they need to move laterally — to the database, to object storage, to other services in the same VPC.

The LLM was never the target. It was the delivery mechanism.

Why This Is the Problem We Set Out to Solve

We started building mistaike because we kept running into a specific blind spot in how organisations think about AI agents: they treat the LLM as the security boundary.

The assumption is: if we trust the model, and we've tuned it not to do bad things, we're safe. Prompt injection — when it's acknowledged at all — gets treated as a correctness problem. Make the model more instruction-following, improve the system prompt, add a content filter on inputs.

The CrewAI CVEs make the flaw in that reasoning concrete. The attacker doesn't need the LLM to want to do something harmful. They need it to do what it's told — which is exactly what a well-instruction-following model does best.

The tools are the attack surface. Every tool call is a boundary crossing — from the LLM's context into real infrastructure. And in most agent deployments, those boundary crossings are completely unmonitored:

No validation that the arguments a tool receives are within expected ranges
No inspection of what tool responses contain before they re-enter the agent's context
No control over what data leaves the system through tool output channels
No audit log of what was executed, fetched, or read

This is the gap we built mistaike to close. DLP on the tool-call boundary catches credentials moving in either direction — whether the exfiltration is deliberate (a compromised tool returning AWS keys) or incidental (the agent summarising the contents of /proc/self/environ in its response). Content safety on tool inputs catches injection payloads before they reach the LLM and trigger malicious tool calls in the first place.

The CVE chain above represents a worst-case scenario for an unprotected agent deployment. With inspection at the tool boundary, several steps of it become detectable or blockable before system compromise.

Why This Pattern Extends Beyond CrewAI

CrewAI is not uniquely at fault. It's the framework where these specific bugs were discovered. The same conditions exist throughout the agent ecosystem — because they're not bugs so much as architectural defaults.

The pattern requires three things, all of which are common:

Untrusted input reaches the agent's context. This means RAG retrieval from external sources, user messages, tool outputs from third-party services, webhook payloads — anything the agent processes that an attacker can influence. In production agentic deployments, this is almost always true.
The agent has tools with system-level capabilities. Code execution, file access, HTTP requests, database queries. These are not exotic — they're the reason people use agent frameworks.
The tools lack independent security boundaries. No sandbox for code execution, no URL allowlisting for HTTP requests, no path validation for file access, no output inspection before re-ingestion.

All three are true for the majority of agent deployments today. The attack surface is not limited to CrewAI users.

What Actually Helps

Patching CrewAI removes these specific vulnerabilities. It doesn't close the underlying pattern.

Harden the sandbox — and fail closed, not open. Docker with a restrictive seccomp profile is a meaningful improvement over unrestricted execution. Kernel-level isolation via gVisor reduces the available syscall surface from 300+ calls to approximately 20 — dramatically limiting what attacker code can do even if it executes. But the more important principle is the failure mode: a sandbox that degrades silently to unrestricted execution provides no protection in practice. If the container runtime isn't available, the Code Interpreter should refuse to execute, not fall back. Fail closed.

Validate tool inputs outside the LLM. The LLM's decision to call a tool with a particular URL, file path, or code payload should not be the final authority on whether that call happens. Tools should enforce their own allow lists and path restrictions independently of what the agent requested — because those restrictions need to hold even when the LLM has been manipulated. Validation that can be bypassed via prompt injection is not validation.

Control egress at the network layer. Default-deny outbound network access for agent execution environments. Explicitly declare the domains each tool is permitted to reach. Block the metadata endpoints (169.254.169.254, metadata.google.internal) at the network level — not via application logic that an injected prompt can influence. An attacker who achieves code execution but can't reach the metadata service or exfiltrate data externally has a significantly reduced blast radius.

Inspect what crosses the tool boundary. Every tool call is an outbound data channel. Every tool response is an inbound data channel. DLP on both directions catches credentials and sensitive data moving in either direction — whether the cause is prompt injection, a compromised dependency, or a misconfiguration. This is particularly important for tool responses re-entering the agent's context, where sensitive content can be picked up and summarised by the LLM without any explicit exfiltration step.

Treat the LLM as permanently compromised. The system's security properties must hold when the model is fully controlled by an attacker. Prompt injection defences are improving, but they are probabilistic. Any security boundary that relies on the LLM correctly identifying and refusing malicious instructions is not a security boundary.

The Uncomfortable Implication

The AI agent ecosystem has spent two years treating prompt injection as a trust and safety problem — how do we stop the LLM from being rude, leaking its system prompt, or generating off-policy content?

The CrewAI CVEs are a reminder that prompt injection in an agentic context is a systems security problem. The attacker's goal isn't to make the LLM say something embarrassing. It's to use the LLM as an authenticated proxy — one that already has access to your infrastructure — to reach systems that would otherwise require direct compromise.

Every tool an agent can invoke is an attack surface. Every data source it reads is an injection point. The security boundary isn't the LLM's instruction-following fidelity. It's the isolation between the agent and the systems it touches — and right now, for most deployments, that isolation ranges from thin to non-existent.

References

CERT/CC: VU#221883 — CrewAI contains multiple vulnerabilities including SSRF, RCE and local file read
CrewAI Vulnerabilities Allow Attackers to Bypass Sandboxes and Compromise Systems (CyberPress, April 2026)
CrewAI Hit by Critical Vulnerabilities Enabling Sandbox Escape and Host Compromise (GBHackers, April 2026)
CrewAI Vulnerabilities Expose Devices to Hacking (SecurityWeek, April 2026)

Nick Stocks is the founder of mistaike.ai.