[Case Study] Threat Simulation of AI Agents in Microsoft Copilot Studio

Executive Summary

Blueinfy performed a focused, time-bound security review of Microsoft Copilot Studio and its implementation at ACME to assess the potential risks introduced by AI agents.

The objective of the engagement was to evaluate how AI agents both legitimate and malicious could be misused, intentionally or unintentionally, to 

  • Access sensitive enterprise data
  • Expose user specific information
  • Perform unauthorized actions
  • Enable data exfiltration

The assessment combined configuration review with hands on threat simulation, where custom agents were built to replicate realistic attack scenarios as well as instructions were passed to exploit legitimate agents. The results demonstrated that even with platform level controls enabled, significant risks can persist due to configuration gaps, excessive permissions, and agent behavior manipulation.

The environments given for testing had pre-configured policies and controls applied prior to the assessment. The scope included - 

  • AI agent configuration within Microsoft Copilot Studio
  • Data access patterns through agents
  • Connector usage and restrictions
  • Guardrails and safety configurations
  • Threat simulation using custom-built agents

Assessment Methodology

Blueinfy adopted a structured methodology combining configuration validation and adversarial testing.

1. AI Configuration Review

A focused configuration review was conducted to evaluate AI-specific settings. 

Areas Reviewed:
•    AI agent configuration settings
•    Connector policies and restrictions
•    Data access configurations
•    Prompt safety and guardrails
•    Logging and monitoring capabilities

Objective:
•    Identify risky configurations
•    Recommend controls to reduce exposure
•    Highlight configurations requiring governance before enablement

2. Agent Threat Simulation

Instead of testing existing agents, Blueinfy created custom agents within the allowed policy boundaries to simulate real world attack scenarios. This approach ensured:

  • No disruption to production agents
  • Realistic exploitation within permitted configurations
  • Validation of platform controls under adversarial conditions

Threat Simulation Approach

Agents were built using only approved connectors and policies within the environment. Two categories of agents were designed:

1. Misuse of Legitimate Agents

  • Agents behaving as intended but manipulated via inputs
  • Exploiting trust in user prompts

2. Malicious Agent Design

  • Agents intentionally designed to bypass safeguards
  • Leveraging allowed configurations to simulate abuse

Key Attack Scenarios Tested

Blueinfy executed multiple scenarios to evaluate risk exposure:

  • Prompt Injection and Instruction Override - Manipulating agent behavior using crafted inputs to override system instructions and cause unintended data access
  • Data Exfiltration via Allowed Channels - Extracting sensitive data through email connectors, API responses and structured outputs
  • Cross-Agent Interaction Risks - Simulating agent-to-agent communication and demonstrating potential lateral movement
  • Rouge Agents – Malicious agents built with system instructions to exfiltrate data, phish users for credentials and send application/user data to unintended servers
  • MCP Exposure – If MCP server and tools are accessible without correct authentication and authorization mechanisms

Key Observations

The assessment revealed several important findings:

  • Misconfigurations Create Hidden Risks – Users of the agents are completely unaware of the data risks since once published/shared, end users have limited visibility in the agent configuration. User and application could be exfiltrated to third-party servers
  • Agents Can Be Manipulated Through Inputs – Based on the guardrails, prompt injection enabled behavior override and agents could be influenced to exfiltrate data without changing configuration. This created a scenario where legitimate functionality could be leveraged for unintended outcomes
  • Unauthenticated MCP Exposure – MCP tools were exposed without authentication and sensitive client data was leaked
  • Platform Controls Are Not Sufficient Alone - While Microsoft Copilot Studio provides robust built-in controls, their effectiveness depends heavily on configuration and usage. Limited logging of agent behavior and insufficient detection of abnormal activity could not restrict malicious activity even during runtime

Conclusion

Blueinfy’s assessment demonstrated that while platforms like Microsoft Copilot Studio do provide strong foundational controls, they must be complemented with:

  • Proper configuration
  • Risk-aware governance
  • Adversarial testing
  • Monitoring and logging

By moving from assumption based security to evidence driven validation, ACME established a stronger foundation for secure AI adoption. Blueinfy team worked with ACME to create a robust agent threat simulation and security review process to protect against such risks with scaling agents in parallel. Please read this blog for the three-tier risk methodology for an agent review process.

Article by Hemil Shah and Rishita Sarabhai 

The Rise of AI Agents and the urgent need for an Agent Security Review Process

Organizations today are rapidly embracing AI-powered agents. Platforms like Microsoft Copilot Studio and Google Gemini are enabling business users, not just developers, to create powerful agents that automate workflows, access enterprise data, and make decisions. This democratization is transformative. But it also introduces a new, largely ungoverned attack surface.

The Explosion of Agents

In many enterprises, the number of agents being deployed is growing exponentially from hundreds, sometimes thousands, within a short span of time. These agents:

  • Integrate with internal systems
  • Access sensitive enterprise data
  • Perform automated actions on behalf of users

Unlike traditional applications, these agents are often created outside formal development pipelines by business users, analysts, or developers. And that’s where the problem begins.

The Security Gap: No "AgentSec"

Organizations have matured practices for AppSec or InfraSec or Cloud Security but Agent Security (AgentSec) is still in its infancy.
There is typically:

  • No formal review process before agent deployment
  • Limited visibility into what agents are doing
  • No standardized threat modeling for agent behavior
  • Weak validation of platform-level security controls

This creates a dangerous blind spot.

Built-in Controls Are Not Enough

Platforms do provide security mechanisms at:

  • Data access controls
  • Authentication and authorization layers
  • Prompt filtering and safety guardrails
  • Activity monitoring

However, these controls are:

  • Complex to configure correctly
  • Highly dependent on implementation choices
  • Difficult to validate in real-world scenarios

Misconfigurations or misunderstandings can easily render these protections ineffective.

Visualizing the Risk: Agent Attack Flow


The Missing Piece: A Scalable Agent Review Process

At first glance, the solution seems straightforward: introduce agent design reviews, configuration assessments, and threat modeling for every agent. But in reality, this approach does not scale.

In large enterprises with hundreds or thousands of agents built on platforms, performing deep security reviews on every agent would:

  • Overwhelm security teams
  • Slow down innovation
  • Create operational bottlenecks

Instead, organizations must adopt a risk-based Agent Security (AgentSec) model. The three-tier risk model classifies agents based on their potential impact and exposure. 
 

  • High-risk agents are typically misconfigured or intentionally malicious, capable of unsafe actions such as exfiltrating data to external emails, interacting with unauthorized external URLs, or executing harmful embedded instructions. 
  • Medium-risk agents involve broader data interaction—often consuming sensitive or user-provided inputs through connectors, APIs, MCP integrations, or multi-agent communication—making them more prone to misuse or unintended data exposure. 
  • Low-risk agents operate within a constrained scope, relying on public or read-only data sources such as web search, uploaded files, SharePoint, or Dataverse, with minimal ability to cause harm.

Automation enables scale by classifying the agents into risk buckets and a focused review can then be performed only for high-risk and medium-risk agents to assess the business impact by building abuse/exploit scenarios. This approach ensures that security teams invest effort where it truly matters - prioritizing depth and accuracy over volume.

Why This Model Works

This approach delivers both speed and security: fast approvals for low-risk agents, strong scrutiny for higher-risk ones, reduced burden on security teams, and scalable governance across thousands of agents. Most importantly, it aligns security effort with actual risk - not perceived risk.

The organizations that succeed will not be those attempting to review every agent, but those that automate the baseline, enforce non-negotiable security gates, and escalate only what truly matters. Because in a world of thousands of agents, scalability itself becomes security.

Article by Hemil Shah and Rishita Sarabhai 

Agentic AI Security - Threats and Attacks (Paper Review)

Agentic AI systems transform LLMs into autonomous operators that plan, call tools, use memory, and act across web, code, APIs, and even physical environments, which radically enlarges the attack surface beyond simple chatbots. The paper frames security for these systems around concrete threat families: prompt injection and jailbreaks; autonomous cyber‑exploitation with tool abuse; multi‑agent and protocol‑level attacks (including MCP and agent‑to‑agent ecosystems); and environment/interface issues such as unsafe action spaces and brittle web interaction. These systems must therefore be treated as distributed, partially trusted components that can both be attacked and weaponized as attackers themselves.

Prompt‑centric threats are broken down into direct and indirect prompt injection, intentional and unintentional attacks, multi‑modal and hybrid payloads (text, images, audio, code), propagation behaviors, and multilingual/obfuscated or split payloads that evade naive filters. Attackers can poison external content sources (web pages, PDFs, accessibility trees, APIs), craft adversarial code/SQL prompts, or hide instructions in non‑text modalities to hijack the agent’s plan and tool calls. The work also highlights that many proposed PI defenses are brittle, with adaptive IPI attacks able to bypass perplexity‑based and pattern‑based detectors in practice, which reinforces PI as a primary attack vector against agentic workflows.

On the offensive operations side, the paper shows that agents with code execution and network access can autonomously perform vulnerability discovery and exploitation, often outperforming traditional tools like OWASP ZAP or Metasploit on known‑vulnerable targets when given CVE descriptions and appropriate tools. Demonstrated capabilities include chaining XSS, CSRF, SSTI, and SQLi, navigating web apps in realistic sandboxes, and leveraging tools to iteratively refine exploits without human guidance. In multi‑agent and protocol‑driven settings (e.g., MCP or cross‑org agent meshes), they describe additional vectors such as fake or compromised agent registration, denial of service via recursive delegation, transitive prompt‑injection across agents, memory poisoning, and identity or role abuse that propagates through the agent network.

Reference: 

AGENTIC AI SECURITY:THREATS, DEFENSES, EVALUATION, AND OPEN CHALLENGES - https://arxiv.org/pdf/2510.23883 

 

Why Agentic Pentesting Can’t Fix the False Positive Problem

Agentic pentesting promises smarter orchestration of tools, but it does not magically eliminate false positives. At its core, an agent still leans on the same scanners, payload generators, and detection heuristics that produced noisy results in the first place. If the underlying tools misclassify behavior or lack application context, the agent simply becomes a faster, more automated way to generate and route those misclassifications. In other words, you risk “scaling the noise” as much as scaling the signal.

Another limitation is that most agentic systems still struggle with business context and intent, which is where many false positives are born. A finding that looks critical in HTTP traces might be benign in the real-world workflow because of compensating controls, domain‑specific logic, or risk acceptance decisions that only humans understand. Agents can replay exploits and correlate signals, but they cannot reliably answer questions like “Is this test user data or real PII?” or “Would exploiting this actually harm the business?” Without that judgment, they often cannot confidently close the loop on whether something is truly a vulnerability or just an academic issue.

Finally, agentic pentesting introduces its own new sources of error that can masquerade as false positives. Misconfigured prompts, overly broad goals, or aggressive automation can lead agents to test unsupported flows, mis-handle authentication, or misinterpret application responses. These mistakes can create “findings” that look real on paper but collapse under minimal human scrutiny. So while agentic approaches can help prioritize, group, and sometimes auto‑retest issues, they do not remove the need for human validation; they merely change where you spend your validation effort—from sifting through raw scanner output to scrutinizing AI‑curated results.

SSRF in Azure MCP Server Tools

In Microsoft's March 2026 Patch Tuesday release on March 10, an urgent high-severity vulnerability, CVE-2026-26118, emerged in Azure Model Context Protocol (MCP) Server Tools. This server-side request forgery (SSRF) flaw, scored at CVSS 8.8, allows low-privileged attackers to manipulate user-supplied inputs and force the server into making unauthorized outbound requests to attacker-controlled endpoints. MCP, designed to standardize AI model integrations with external data sources, unexpectedly became a vector for privilege escalation in AI-driven Azure environments, highlighting the growing risks in agentic AI architectures.

At its core, exploitation involves crafting malicious payloads that trick the MCP server—running versions prior to 2.0.0-beta.17—into leaking its managed identity token. Attackers can then impersonate the server's identity to access sensitive Azure resources like storage accounts, virtual machines, or databases, all without needing admin rights or user interaction. Public proof-of-concept exploits, such as those on GitHub, amplify the threat, enabling rapid weaponization in targeted attacks against organizations leveraging MCP for AI workflows. This vulnerability underscores a classic SSRF pattern (CWE-918) but tailored to cloud-native AI tools, where broad service principals often grant excessive permissions.

Organizations should prioritize patching via Microsoft's Security Update Guide, audit MCP deployments for over-privileged identities, and implement outbound request filtering to contain risks. As AI security evolves, this incident signals the need for runtime protections in MCP-based systems, including token rotation and anomaly detection for AI agent traffic. Application security teams, especially those testing AI integrations, can use tools like Burp Suite to validate fixes against SSRF payloads. Staying vigilant ensures AI innovation doesn't outpace defense in the cloud.

Reference - https://www.tenable.com/cve/CVE-2026-26118

Supply Chains and AI: Decoding OWASP Top 10 2026 Changes

OWASP’s 2026 Top 10 reflects how quickly modern application threats are evolving, especially with AI-heavy and highly distributed architectures. The list continues to emphasize long-standing problems like Broken Access Control and Cryptographic Failures, but the new edition elevates security misconfigurations and software supply chain issues as first-class risks. This shift acknowledges that complex CI/CD pipelines, third‑party services, and AI-powered components have dramatically expanded the attack surface beyond just your own code.

A key change in 2026 is the explicit spotlight on software supply chain failures and the mishandling of exceptional conditions. These categories capture real‑world issues such as compromised libraries, poisoned models, insecure infrastructure-as-code templates, and fragile error handling that leads to data leakage or privilege escalation. Rather than treating these as edge cases, OWASP now frames them as systemic risks that can undermine even well‑written business logic. For teams shipping fast, this is a wake‑up call that “secure by default” must include dependencies, pipelines, and runtime behavior—not just input validation and authentication.

The importance of the 2026 Top 10 lies in how it guides priorities for engineering, security architecture, and governance. It gives product and security leaders a shared vocabulary to justify investments in SBOMs, dependency scanning, secure AI integration patterns, and runtime protection. For practitioners, it acts as a practical roadmap: threat modeling features around these categories, aligning test cases and code reviews with them, and measuring progress over time. In a world where AI agents, APIs, and microservices are deeply interwoven, using the updated OWASP Top 10 as a baseline can be the difference between a resilient platform and one supply‑chain incident away from a major breach.

Unauthorized MCP Server Exposure in Enterprise Deployments

Overview
Model Context Protocol (MCP) servers are increasingly being adopted in enterprise AI applications to expose controlled tools and internal business functions to LLM-powered clients. These MCP tools often provide direct access to workflows, client context, operational data, and application capabilities.

In secure deployments, MCP servers are expected to be consumed only through authorized MCP clients embedded within approved enterprise AI interfaces, with access governed by user roles and feature entitlements. 

During recent security assessments, some of the most impactful vulnerabilities have been observed not at the prompt layer, but at the protocol and connectivity layer — specifically around unauthorized and unauthenticated MCP server connections.

Intended Architecture
In the expected design:

  • The enterprise application exposes internal capabilities through the MCP server
  • Connections for MCP server that expose tools with sensitive data require authenticated connectivity
  • Connections are allowed solely from approved MCP clients within the enterprise AI interface
  • MCP access is enabled only for specific user roles and subscription tiers

The intended architecture involves two major implementation layers – authentication as the first layer of MCP connectivity + authorization of an MCP host/client as a bridge between the user and MCP server.

 

Commonly Identified Vulnerabilities

Unauthorized and unauthenticated MCP connectivity is emerging as one of the most impactful vulnerability classes in MCP-based enterprise AI deployments, as it bypasses both application-layer controls and traditional authorization boundaries.

An unauthenticated MCP server effectively allows external entities to invoke available tools directly, resulting in immediate leakage of sensitive business or client information. Moreover, based on the designed MCP tools, it might even allow to invoke tools that trigger unintended actions impacting the confidentiality, integrity as well as the availability of applications.

Moreover, it has also been observed that the MCP Server accepts connections from any MCP host, including third-party LLM clients such as Claude Desktop or locally hosted LLM application. 

A client user could obtain a valid application access token and establish an MCP connection outside the intended enterprise AI interface, thereby accessing MCP tools through an unauthorized MCP host.

This results in an Unauthorized LLM MCP Bridge, bypassing the platform’s intended feature restrictions. 

Security Impact
This vulnerability introduces multiple risks:

  • Unauthenticated access to the MCP server leads to extraction of sensitive data or even performance of unintended actions based on the designed MCP tools 
  • Client users can access MCP tool capabilities through a side door even when LLM access is explicitly denied for them
  • Third-party MCP clients can invoke MCP tools and receive sensitive business or client data that can be used for fine-tuning/training LLM without enterprise consent
  • Other insecure MCP servers connected to the same MCP client can lead to rug-pulling, inter-tool poisoning attacks
  • Feature and subscription controls can be bypassed, leading to unauthorized usage and potential financial loss

Recommended Mitigation
The MCP server must enforce proper authentication as well as client-level, user-level authorization, not just token validation. Key remediation steps include:

  • Allow only authenticated MCP connections to Enterprise MCP servers (unless it is a MCP server for public use)
  • Allow MCP connections only from authorized MCP hosts/clients
  • Apply IP whitelisting / network restrictions so only approved enterprise hosts can connect
  • Bind MCP tool access to user-type entitlements and role policies
  • Monitor for unexpected MCP client connection attempts

Conclusion
MCP servers should be treated as privileged enterprise APIs. Without strict client validation, they can become unintended external access paths into internal application tools and data. 

Securing MCP deployments requires enforcing authentication, authorization, trusted MCP clients, network segmentation, and entitlement-aware tool authorization.

Article by Hemil Shah and Rishita Sarabhai