[Case Study] Threat Simulation of AI Agents in Microsoft Copilot Studio

Executive Summary

Blueinfy performed a focused, time-bound security review of Microsoft Copilot Studio and its implementation at ACME to assess the potential risks introduced by AI agents.

The objective of the engagement was to evaluate how AI agents both legitimate and malicious could be misused, intentionally or unintentionally, to 

  • Access sensitive enterprise data
  • Expose user specific information
  • Perform unauthorized actions
  • Enable data exfiltration

The assessment combined configuration review with hands on threat simulation, where custom agents were built to replicate realistic attack scenarios as well as instructions were passed to exploit legitimate agents. The results demonstrated that even with platform level controls enabled, significant risks can persist due to configuration gaps, excessive permissions, and agent behavior manipulation.

The environments given for testing had pre-configured policies and controls applied prior to the assessment. The scope included - 

  • AI agent configuration within Microsoft Copilot Studio
  • Data access patterns through agents
  • Connector usage and restrictions
  • Guardrails and safety configurations
  • Threat simulation using custom-built agents

Assessment Methodology

Blueinfy adopted a structured methodology combining configuration validation and adversarial testing.

1. AI Configuration Review

A focused configuration review was conducted to evaluate AI-specific settings. 

Areas Reviewed:
•    AI agent configuration settings
•    Connector policies and restrictions
•    Data access configurations
•    Prompt safety and guardrails
•    Logging and monitoring capabilities

Objective:
•    Identify risky configurations
•    Recommend controls to reduce exposure
•    Highlight configurations requiring governance before enablement

2. Agent Threat Simulation

Instead of testing existing agents, Blueinfy created custom agents within the allowed policy boundaries to simulate real world attack scenarios. This approach ensured:

  • No disruption to production agents
  • Realistic exploitation within permitted configurations
  • Validation of platform controls under adversarial conditions

Threat Simulation Approach

Agents were built using only approved connectors and policies within the environment. Two categories of agents were designed:

1. Misuse of Legitimate Agents

  • Agents behaving as intended but manipulated via inputs
  • Exploiting trust in user prompts

2. Malicious Agent Design

  • Agents intentionally designed to bypass safeguards
  • Leveraging allowed configurations to simulate abuse

Key Attack Scenarios Tested

Blueinfy executed multiple scenarios to evaluate risk exposure:

  • Prompt Injection and Instruction Override - Manipulating agent behavior using crafted inputs to override system instructions and cause unintended data access
  • Data Exfiltration via Allowed Channels - Extracting sensitive data through email connectors, API responses and structured outputs
  • Cross-Agent Interaction Risks - Simulating agent-to-agent communication and demonstrating potential lateral movement
  • Rouge Agents – Malicious agents built with system instructions to exfiltrate data, phish users for credentials and send application/user data to unintended servers
  • MCP Exposure – If MCP server and tools are accessible without correct authentication and authorization mechanisms

Key Observations

The assessment revealed several important findings:

  • Misconfigurations Create Hidden Risks – Users of the agents are completely unaware of the data risks since once published/shared, end users have limited visibility in the agent configuration. User and application could be exfiltrated to third-party servers
  • Agents Can Be Manipulated Through Inputs – Based on the guardrails, prompt injection enabled behavior override and agents could be influenced to exfiltrate data without changing configuration. This created a scenario where legitimate functionality could be leveraged for unintended outcomes
  • Unauthenticated MCP Exposure – MCP tools were exposed without authentication and sensitive client data was leaked
  • Platform Controls Are Not Sufficient Alone - While Microsoft Copilot Studio provides robust built-in controls, their effectiveness depends heavily on configuration and usage. Limited logging of agent behavior and insufficient detection of abnormal activity could not restrict malicious activity even during runtime

Conclusion

Blueinfy’s assessment demonstrated that while platforms like Microsoft Copilot Studio do provide strong foundational controls, they must be complemented with:

  • Proper configuration
  • Risk-aware governance
  • Adversarial testing
  • Monitoring and logging

By moving from assumption based security to evidence driven validation, ACME established a stronger foundation for secure AI adoption. Blueinfy team worked with ACME to create a robust agent threat simulation and security review process to protect against such risks with scaling agents in parallel. Please read this blog for the three-tier risk methodology for an agent review process.

Article by Hemil Shah and Rishita Sarabhai 

The Rise of AI Agents and the urgent need for an Agent Security Review Process

Organizations today are rapidly embracing AI-powered agents. Platforms like Microsoft Copilot Studio and Google Gemini are enabling business users, not just developers, to create powerful agents that automate workflows, access enterprise data, and make decisions. This democratization is transformative. But it also introduces a new, largely ungoverned attack surface.

The Explosion of Agents

In many enterprises, the number of agents being deployed is growing exponentially from hundreds, sometimes thousands, within a short span of time. These agents:

  • Integrate with internal systems
  • Access sensitive enterprise data
  • Perform automated actions on behalf of users

Unlike traditional applications, these agents are often created outside formal development pipelines by business users, analysts, or developers. And that’s where the problem begins.

The Security Gap: No "AgentSec"

Organizations have matured practices for AppSec or InfraSec or Cloud Security but Agent Security (AgentSec) is still in its infancy.
There is typically:

  • No formal review process before agent deployment
  • Limited visibility into what agents are doing
  • No standardized threat modeling for agent behavior
  • Weak validation of platform-level security controls

This creates a dangerous blind spot.

Built-in Controls Are Not Enough

Platforms do provide security mechanisms at:

  • Data access controls
  • Authentication and authorization layers
  • Prompt filtering and safety guardrails
  • Activity monitoring

However, these controls are:

  • Complex to configure correctly
  • Highly dependent on implementation choices
  • Difficult to validate in real-world scenarios

Misconfigurations or misunderstandings can easily render these protections ineffective.

Visualizing the Risk: Agent Attack Flow


The Missing Piece: A Scalable Agent Review Process

At first glance, the solution seems straightforward: introduce agent design reviews, configuration assessments, and threat modeling for every agent. But in reality, this approach does not scale.

In large enterprises with hundreds or thousands of agents built on platforms, performing deep security reviews on every agent would:

  • Overwhelm security teams
  • Slow down innovation
  • Create operational bottlenecks

Instead, organizations must adopt a risk-based Agent Security (AgentSec) model. The three-tier risk model classifies agents based on their potential impact and exposure. 
 

  • High-risk agents are typically misconfigured or intentionally malicious, capable of unsafe actions such as exfiltrating data to external emails, interacting with unauthorized external URLs, or executing harmful embedded instructions. 
  • Medium-risk agents involve broader data interaction—often consuming sensitive or user-provided inputs through connectors, APIs, MCP integrations, or multi-agent communication—making them more prone to misuse or unintended data exposure. 
  • Low-risk agents operate within a constrained scope, relying on public or read-only data sources such as web search, uploaded files, SharePoint, or Dataverse, with minimal ability to cause harm.

Automation enables scale by classifying the agents into risk buckets and a focused review can then be performed only for high-risk and medium-risk agents to assess the business impact by building abuse/exploit scenarios. This approach ensures that security teams invest effort where it truly matters - prioritizing depth and accuracy over volume.

Why This Model Works

This approach delivers both speed and security: fast approvals for low-risk agents, strong scrutiny for higher-risk ones, reduced burden on security teams, and scalable governance across thousands of agents. Most importantly, it aligns security effort with actual risk - not perceived risk.

The organizations that succeed will not be those attempting to review every agent, but those that automate the baseline, enforce non-negotiable security gates, and escalate only what truly matters. Because in a world of thousands of agents, scalability itself becomes security.

Article by Hemil Shah and Rishita Sarabhai