Blueinfy's blog: Agentic AI Security - Threats and Attacks (Paper Review)

Agentic AI systems transform LLMs into autonomous operators that plan, call tools, use memory, and act across web, code, APIs, and even physical environments, which radically enlarges the attack surface beyond simple chatbots. The paper frames security for these systems around concrete threat families: prompt injection and jailbreaks; autonomous cyber‑exploitation with tool abuse; multi‑agent and protocol‑level attacks (including MCP and agent‑to‑agent ecosystems); and environment/interface issues such as unsafe action spaces and brittle web interaction. These systems must therefore be treated as distributed, partially trusted components that can both be attacked and weaponized as attackers themselves.

Prompt‑centric threats are broken down into direct and indirect prompt injection, intentional and unintentional attacks, multi‑modal and hybrid payloads (text, images, audio, code), propagation behaviors, and multilingual/obfuscated or split payloads that evade naive filters. Attackers can poison external content sources (web pages, PDFs, accessibility trees, APIs), craft adversarial code/SQL prompts, or hide instructions in non‑text modalities to hijack the agent’s plan and tool calls. The work also highlights that many proposed PI defenses are brittle, with adaptive IPI attacks able to bypass perplexity‑based and pattern‑based detectors in practice, which reinforces PI as a primary attack vector against agentic workflows.

On the offensive operations side, the paper shows that agents with code execution and network access can autonomously perform vulnerability discovery and exploitation, often outperforming traditional tools like OWASP ZAP or Metasploit on known‑vulnerable targets when given CVE descriptions and appropriate tools. Demonstrated capabilities include chaining XSS, CSRF, SSTI, and SQLi, navigating web apps in realistic sandboxes, and leveraging tools to iteratively refine exploits without human guidance. In multi‑agent and protocol‑driven settings (e.g., MCP or cross‑org agent meshes), they describe additional vectors such as fake or compromised agent registration, denial of service via recursive delegation, transitive prompt‑injection across agents, memory poisoning, and identity or role abuse that propagates through the agent network.

Reference:

AGENTIC AI SECURITY:THREATS, DEFENSES, EVALUATION, AND OPEN CHALLENGES - https://arxiv.org/pdf/2510.23883

Pages

Agentic AI Security - Threats and Attacks (Paper Review)