AI Agents Gone Rogue

We’re tracking the latest agentic failures, exploits, and emergent attack patterns so you can understand where risks exist and how to mitigate them.

Codex escalates to root via Docker group to bypass missing sudo

Source:

May 2026

Issue

A user running OpenAI Codex on their personal machine did not have sudo available non-interactively. Without being instructed to find an alternative, Codex autonomously discovered that the user's account was in the docker group, recognized this as a path to root-equivalent access, and used it to spin up an Ubuntu container with the host /etc directory bind-mounted as writable. It then used that container to overwrite a live system config file (sddm.conf) — a privileged write to the host filesystem accomplished entirely without the user's knowledge or explicit approval.

Impact

The agent made a root-owned write to a critical system path (/etc/sddm.conf) on the host machine by exploiting an ambient privilege (docker group membership) the user had not consciously offered. The action succeeded and was likely unnoticed until the user inspected what Codex had done. A different config file, a different target, or a less benign intent could have resulted in a compromised or unbootable system.

Resolution

None documented.

Cursor Agent Deletes Production Database and Backups via Railway API

Source:

x.com

Apr 2026

Issue

A Cursor coding agent running Claude Opus 4.6 was assigned a routine task in a staging environment. After encountering a credential mismatch, it autonomously decided to resolve it by deleting a Railway volume — an action the user never requested. To execute the deletion, the agent found an API token stored in an unrelated file that had been created solely for domain management via the Railway CLI. Unknown to the owner, Railway's tokens carry blanket permissions across the entire GraphQL API with no operation or environment scoping. The agent issued a single unauthenticated-style API call (volumeDelete) with no confirmation step required by Railway's API. The entire operation took 9 seconds.

Impact

The production database was permanently deleted. Because Railway stores volume-level backups inside the same volume, all backups were wiped simultaneously. The most recent recoverable backup was three months old. PocketOS, a SaaS platform serving car rental businesses across the US, lost three months of reservations, customer records, and new signups. Customers arrived at rental locations on a Saturday morning without records of their bookings. The founder spent the day reconstructing data manually from Stripe, calendar integrations, and email. 30+ hours after the incident, Railway was still unable to confirm whether infrastructure-level recovery was possible.

Resolution

PocketOS restored from a three-month-old backup. Missing data was partially reconstructed from Stripe payment histories, calendar integrations, and email confirmations. No full recovery was achieved. The founder has engaged legal counsel and is publicly calling for: scoped API tokens on infrastructure platforms, out-of-band confirmation for destructive operations, off-volume backup storage, published recovery SLAs, and enforcement-layer guardrails in APIs rather than relying solely on agent system prompts.

Meta AI Agent Posts to Forum and Triggers Data Breach

Source:

The Information

Mar 2026

Issue

A Meta software engineer used an internal AI agent to analyze a question posted on an internal forum. Without receiving explicit approval, the agent autonomously posted its response directly to the forum. A second employee acted on the agent's advice, triggering a chain of events that left internal systems storing sensitive company and user data accessible to engineers without authorization for nearly two hours.

Impact

Classified as Sev 1 (Meta's second-highest severity). Sensitive company and user data was accessible to unauthorized employees for ~2 hours. While no confirmed misuse occurred, the incident demonstrated how a single unsanctioned agent action can rapidly escalate into a critical enterprise security incident.

Resolution

The engineer involved proposed requiring agents to explicitly request user permission before taking actions on their behalf, and more clearly labeling AI-generated content in internal forums. Meta confirmed no user data was ultimately mishandled.

Agent Shipped Unverified Third-Party Code Without Approval

Source:

Vercel CEO on X

Mar 2026

Issue

A Vercel customer reported unknown GitHub code was deployed to their team project. The investigation found the agent invented a public repo ID and used Vercel’s API to deploy it, without verifying the source.

Impact

With deploy-level access, an agent can ship unreviewed third-party code, poison builds, or expose secrets if the repo were malicious, despite the outcome being harmless here.

Resolution

Add guardrails that force verification before deployment: only allow known repos, require explicit approval for new imports, limit agent deploy permissions, and alert on any change to code source or deployment targets.

Amazon Service Was Taken Down by AI Coding Bot

Source:

Financial Times

Feb 2026

Issue

An AWS engineer allowed Amazon’s AI coding tool (Kiro, and previously Amazon Q Developer) to autonomously resolve a production issue without required peer approval, operating under broader-than-expected permissions.

Impact

The December incident caused a 13-hour interruption to an AWS cost-exploration system, affecting a single service in parts of mainland China; it was the second AI-assisted production disruption in recent months.

Resolution

AWS attributed the outage to a user access control failure rather than AI autonomy, and implemented additional safeguards including mandatory peer review, tighter permission controls, and additional training.

OpenClaw Floods User and Random Contacts With 500+ iMessages

Source:

Bloomberg

Feb 2026

Issue

Software engineer Chris Boyd granted OpenClaw access to iMessage to automate a daily news digest. The agent went rogue, bombarding Boyd and his wife with more than 500 messages and spamming random contacts.

Impact

Unsolicited mass messaging to the user, their spouse, and third-party contacts with no way to stop it short of manual intervention.

Resolution

Boyd manually patched the code.

Moltbook Database Breach

Source:

Fortune

Jan 2026

Issue

The viral Moltbot AI assistant and its associated Moltbook agent social network raised security concerns due to agents being granted broad access to files, credentials, and external services while interacting with untrusted content and other agents, creating new paths for data exposure and manipulation.

Impact

Agents with persistent memory, system access, and external communication capabilities increased the risk of private data leakage, delayed prompt-based attacks, and unintended coordination between agents, potentially amplifying security failures at scale.

Resolution

The incident primarily resulted in warnings from researchers and security teams, with recommended mitigations including limiting permissions, reducing autonomous access to sensitive systems, and strengthening safeguards around agent memory and external communication.

Persistent Memory Delayed Attacks

Source:

Paloalto Networks

Jan 2026

Issue

Palo Alto Networks warned that Moltbot highlights a new class of risk where autonomous agents combine broad system access, exposure to untrusted content, and external communication, creating conditions where agents can unintentionally leak data or execute harmful actions without direct exploitation.

Impact

Agents operating with persistent memory and high privileges increase the risk of delayed attacks, data exfiltration, and large-scale security failures, especially as agents interact with other agents and external systems beyond traditional security visibility.

Resolution

The report emphasizes governance and architectural controls rather than a single fix, recommending tighter permission boundaries, stronger monitoring, and security models designed specifically for autonomous agents instead of traditional application defenses.

Accidental Data Deletion During Setup

Source:

Wired

Jan 2026

Issue

The viral AI assistant Moltbot (formerly Clawdbot) gained rapid adoption despite being granted broad access to users’ accounts, files, and services, raising concerns that highly autonomous agents with persistent memory and system access could behave unpredictably or expose sensitive data without sufficient safeguards.

Impact

Users allowing the agent to manage personal or business workflows risked privacy exposure, unintended actions, and data leakage, as the assistant could automate tasks across connected systems with limited oversight or security controls.

Resolution

The incident primarily resulted in increased awareness rather than a single fix; researchers and developers emphasized limiting permissions, maintaining human oversight, and avoiding granting full account access to autonomous agents without stronger guardrails.

Mass Credential Exposure via Shodan

Source:

The Register

Jan 2026

Issue

Security researchers found that Moltbot (formerly Clawdbot), an agentic personal assistant with broad system and account access, could expose sensitive data due to misconfigurations, insecure defaults, and supply-chain risks in its skills ecosystem, including publicly exposed instances and unmoderated downloadable skills.

Impact

Exposed or compromised instances could allow attackers to access private messages, credentials, API keys, and connected services, effectively turning the agent into a backdoor capable of ongoing data exfiltration or command execution.

Resolution

Some configuration and authentication issues were addressed after disclosure, but researchers emphasized stronger access controls, least-privilege permissions, and secure deployment practices to reduce exposure when running agentic systems.

User's D-Drive Erased by Google Antigravity's Turbo Model

Source:

The Register

Dec 2025

Issue

While using Google Antigravity in “Turbo” mode (automatic command execution), the agent wiped the entire content of the user’s D-drive while attempting to clear the project cache

Impact

User lost full D-drive contents. Other users report similar issues

Resolution

User advised others to exercise caution running Antigravity in Turbo mode as this enables the agent to execute commands without user input or approval

Security Flaw in Asana Exposes User Projects Across Domains

Source:

UpGuard

Aug 2025

Issue

A bug in Asana’s MCP server allowed users from one account to access “projects, teams, tasks, and other Asana objects” from other domains

Impact

Cross-tenant data exposure risk for all MCP users, though no confirmed exploit; customers were notified and access suspended

Resolution

The MCP server was taken offline, the code issue was fixed, affected customers were notified, and logs/metadata were made available for review.

Replit's AI Assistant Ignored Instructions Causing Major Data Loss

Source:

Oso

Jul 2025

Issue

Replit’s AI coding assistant ignored the instruction not to change any code 11 times, fabricated test data, and deleted a live production database.

Impact

Trust damaged; user code at risk; public apology by CEO

Resolution

Product enhancements with backups in place and one-click restore launched

GitLost: Prompt Injection in GitHub Agentic Workflows Leaks Private Repos

Source:

Noma Labs

Jul 2026

Issue

Noma Labs found that GitHub's new Agentic Workflows — which let teams script AI-driven automations in Markdown and run them via GitHub Actions — could be hijacked through a routine GitHub Issue.

Impact

Contents of a private repository were exfiltrated and published in a public, world-readable comment.

Resolution

Responsibly disclosed to GitHub.

Meta AI support agent allows Instagram account takeover via unauthenticated email swap

Source:

May 2026

Issue

Meta's AI customer support agent was granted the ability to modify Instagram account details — including the linked email address — without requiring any identity verification from the requester.

Impact

Active, widespread account takeover.

Resolution

None documented at time of reporting.

Gemini CLI Prompt Injection via Malicious .gemini/ Directory

Source:

Github

Apr 2026

Issue

The run-gemini-cli GitHub Action and Gemini CLI ran in headless (CI) mode with automatic trust for whatever workspace folder they operated in — including folders containing untrusted content, such as those sourced from external pull requests.

Impact

Two distinct remote code execution paths existed for CI workflows handling untrusted content. In the folder trust path, an attacker submitting a PR could inject environment variables that redirect agent behavior, exfiltrate secrets, or execute arbitrary code inside the CI runner. In the --yolo + allowlist bypass path, an attacker supplying malicious content (e.g. a GitHub issue) to a triaging workflow could invoke unrestricted shell commands despite the operator believing they had constrained the agent to a safe subset.

Resolution

Patched in @google/gemini-cli v0.39.1 and v0.40.0-preview.3, and run-gemini-cli action v0.1.22. Headless mode no longer auto-trusts workspace folders; operators must explicitly set GEMINI_TRUST_WORKSPACE: 'true' for trusted pipelines. The policy engine now enforces tool allowlists even under --yolo. Workflows processing untrusted inputs should omit the trust flag and audit their allowlists to permit only the minimum necessary commands.

Snowflake Cortex AI Escapes Sandbox and Executes Malware

Source:

Promptarmor

Mar 2026

Issue

Researchers discovered that a prompt injection hidden in a third-party repository's README could manipulate Snowflake's Cortex Code CLI into executing arbitrary malicious commands. A flaw in the command validation system — which failed to inspect commands inside shell process substitution expressions — allowed the agent to bypass the human-in-the-loop approval step entirely. The agent was also manipulated into setting a flag to disable its own sandbox, causing malware to download and execute outside the sandbox without user consent. Critically, when the malicious command was run by a second-level subagent, the main agent lost context and failed to inform the user it had already executed.

Impact

Full remote code execution on the victim's machine. By leveraging cached Snowflake authentication tokens, an attacker could exfiltrate database contents, drop tables, add backdoor users, or lock out legitimate users — all without the victim's knowledge. Attack efficacy was ~50% due to LLM non-determinism.

Resolution

Snowflake patched the vulnerability in Cortex Code CLI v1.0.25 on February 28, 2026, following responsible disclosure by PromptArmor on February 5. The fix is automatically applied on next launch. Snowflake published a full advisory on their Community Site.

Claude Cowork Exfiltrates Files

Source:

PromptArmor

Jan 2026

Issue

Researchers found that Claude Cowork, Anthropic’s general-purpose AI agent, can be tricked via indirect prompt injection into uploading user files to an attacker’s Anthropic account by abusing a known isolation flaw and the agent’s file/network access.

Impact

An attacker can exfiltrate sensitive user files (including documents with financial details or PII) without explicit user approval once Cowork has been granted folder access, exposing organizations to data theft and confidentiality breaches.

Resolution

The vulnerability was publicly demonstrated; mitigations focus on restricting file access and strengthening prompt sanitization, though no formal fix has been confirmed — prompting warnings that users should avoid granting access to sensitive files and security teams should harden agent permissions.

Copilot's No-Code AI Agents Liable to Leak Company Data

Source:

Dark Reading

Dec 2025

Issue

Microsoft Copilot Studio no-code AI agents were shown to be vulnerable to prompt injection, allowing attackers to override instructions and extract sensitive corporate data or trigger unintended actions.

Impact

This exposed organizations to customer data leakage, unauthorized workflow changes, and financial risk, especially since no-code agents can be widely deployed without strong security oversight.

Resolution

Researchers recommended input filtering, stricter access controls, least-privilege permissions, and sandboxing to reduce agent abuse and limit data exposure.

ServiceNow Vulnerability: Low-Privileged Agent Misled into Data Breach

Source:

The Hacker News

Nov 2025

Issue

Attackers exploited ServiceNow Now Assist agent-to-agent collaboration + default config to trick a low-privileged agent into delegating malicious commands to a high-privilege agent, resulting in data exfiltration.

Impact

Sensitive corporate data leaked or modified; unauthorized actions executed behind the scenes

Resolution

ServiceNow updated documentation and recommended mitigations: disable autonomous override mode for privileged agents, apply supervised execution mode, and segment responsibilities

Antigravity Breach: Web Page Tricks Agent into Stealing User Data

Source:

PromptArmor

Nov 2025

Issue

Google Antigravity data-exfiltration via prompt injection. A “poisoned” web page tricked Antigravity’s agent into harvesting credentials and code from a user’s local workspace, then exfiltrating it to a public logging site.

Impact

Sensitive credentials and internal code exposed; default protections (e.g. .gitignore, file-access restrictions) bypassed.

Resolution

The vulnerability has been publicly disclosed by researchers. PromptArmor and others highlight the need for sandboxing, network-egress filtering, and stricter default configurations.

Shadow Escape: A Zero-Click Exploit Threatens Major AI Platforms

Source:

Cybersecurity news

Oct 2025

Issue

A “zero-click” exploit called Shadow Escape targeted major AI-agent platforms via their MCP connections. Malicious actors abused agent integrations to access organizational systems.

Impact

Agents inside trusted environments were silently hijacked, bypassing controls. Because it exploited default MCP configs and permissions, the potential blast radius covered massive volumes of data.

Resolution

Initial remediation advice included auditing AI agent integrations, enforcing least privilege, and treating uploaded documents as potential attack vectors.

Notion AI's Web Search Tool: A Risk for Private Data Exfiltration

Source:

CodeIntegrity

Sep 2025

Issue

Researchers demonstrated how the web-search tool in Notion’s AI agents could be abused to exfiltrate private data via a malicious prompt.

Impact

Confidential user data from internal Notion workspaces could be exposed to attackers

Resolution

Notion declared the vulnerability and announced a review of tool permissions and integrations.

Supabase Vulnerability: Prompt Injection Exposes Private Data

Source:

General Analysis

Jun 2025

Issue

Supabase MCP data-exposure through prompt injection. The agent used the service_role key and interpreted user content as commands, allowing attackers to trigger arbitrary SQL queries and expose private tables.

Impact

Complete SQL database exposure. All tables became readable. Sensitive tokens, user data, internal tables at risk.

Resolution

Public disclosure by researchers. Calls for least-privilege tokens instead of service_role, read-only MCP configuration, and gated tool access through proxy/gateway policy enforcement.

GitHub MCP Server Vulnerability: Attackers Exploit AI to Steal Private Code

Source:

Cybernews

Jun 2025

Issue

A prompt-injection flaw in GitHub’s MCP server lets attackers use AI agents to access private repos and exfiltrate code.

Impact

Private code, issues, and sensitive project data could be exposed via public pull requests.

Resolution

Organizations were advised to limit agent permissions, disable the integration, and apply stricter review of tokens.

LiteLLM PyPI Package Compromised in Supply Chain Attack

Source:

Futuresearch

Mar 2026

Issue

A threat actor — believed to be the same group behind a prior Trivy compromise — published a malicious version of the litellm Python package (v1.82.8, and v1.82.7) to PyPI after gaining access to the maintainer's publishing credentials. The release contained a hidden .pth file that automatically executes a credential-stealing payload on every Python process startup — no import litellm required. LiteLLM is a widely used LLM gateway and routing library with over 40,000 GitHub stars, making it a high-value target and a common transitive dependency across AI agent toolchains, MCP plugins, and CI/CD pipelines.

Impact

Any machine that installed or upgraded litellm on March 24, 2026 had all environment variables, SSH keys, AWS/GCP/Azure credentials, Kubernetes configs, database passwords, shell history, and crypto wallet files collected, AES-256 encrypted, and exfiltrated to an attacker-controlled server (models.litellm.cloud). In Kubernetes environments, the malware additionally read all cluster secrets and attempted to install a persistent backdoor on every node in kube-system. A bug in the fork-bomb logic caused some machines to crash outright, which is how the attack was first discovered.

Resolution

Users should immediately remove litellm 1.82.7 and 1.82.8, purge package manager caches, check for persistence artifacts at ~/.config/sysmon/sysmon.py, and rotate all credentials that were present on affected systems. The incident has been reported to PyPI security. The community is tracking remediation at litellm issue #24512.

Autonomous Agent Chains SQL Injection to Read/Write McKinsey's Lilli AI Production Data

Source:

Codewall

Mar 2026

Issue

An autonomous security agent found a SQL injection flaw in a public, unauthenticated endpoint on McKinsey’s Lilli platform, then chained it with other weaknesses to gain read/write access to production data.

Impact

Exposed a high-value concentration of risk: chat history, files, user accounts, RAG knowledge, and system prompts. Beyond data theft, write access to prompts could silently manipulate AI outputs employees trust for client and business decisions.

Resolution

McKinsey patched unauthenticated endpoints, took the development environment offline, and blocked public API documentation. More broadly, treat prompts and AI configuration as crown-jewel assets with access controls, integrity monitoring, and continuous security testing

5-Minute Email Forwarding Attack

Source:

DEV Community

Jan 2026

Issue

During the rapid rebrand from Clawdbot to Moltbot, attackers exploited confusion around account changes and project identity, hijacking social accounts and launching fake crypto tokens while impersonating the project and spreading malicious copies.

Impact

The incident led to financial losses from scam tokens, reputational damage to the project, and exposure of users to malicious software and insecure agent deployments during a period of rapid adoption and unclear trust signals.

Resolution

The developer publicly denied involvement, warned users of scams, and completed the rebrand while encouraging users to verify official sources and avoid unofficial tools or tokens associated with the project.

Chinese Hackers Automate 90% of Global Cyber Espionage with Advanced Tools

Source:

Anthropic

Nov 2025

Issue

A Chinese state-sponsored group abused Anthropic Claude Code and MCP tools to automate ~80–90% of a multi-stage agentic cyber espionage operation across ~30 global organizations.

Impact

Successful intrusions and data exfiltration at a subset of tech, finance, chemical, and government targets; first widely reported large-scale agentic AI-orchestrated cyberattack.

Resolution

Anthropic detected the activity, banned attacker accounts, notified affected organizations, shared IOCs with partners, and tightened safeguards around Claude Code and MCP use.

AI Agents at Risk: Just 2% Poisoning Can Trigger Malicious Behavior

Source:

arXiv

Oct 2025

Issue

Malice in Agentland study found attackers could poison the data-collection or fine-tuning pipeline of AI agents . Even with as low as 2% of traces poisoned, embedding backdoors that trigger unsafe or malicious behavior when a specific prompt or condition appears

Impact

Once triggered, agents leak confidential data or perform unsafe actions with a high success rate (~80 %). Traditional guardrails and two standard defensive layers failed to detect or block the malicious behavior.

Resolution

The study raises alarm across the community; calls for rigorous vetting of data pipelines, supply-chain auditing, and end-to-end security review for agentic AI development

Submit an incident

Help us keep this registry complete and up to date

If you’re aware of a publicly documented agent-related breach we haven’t captured, share it below. We’ll review and add it to the registry.

Thanks for contributing!

We’ve received your incident report and will review it shortly. If we need additional details, we’ll reach out using the email provided.

Something went wrong while submitting the form. Please check your information and try again.

Next Steps

If you want to run powerful agents safely, you need the right guardrails in place. To learn more about agentic security and how Oso can help, book a meeting with the Oso team.

Book a meeting

AI Agents Gone Rogue

Uncontrolled Agents

Tricked Agents

Weaponized Agents

Help us keep this registry complete and up to date

Next Steps