Executive Summary
Blueinfy performed a focused, time-bound security review of Microsoft Copilot Studio and its implementation at ACME to assess the potential risks introduced by AI agents.
The objective of the engagement was to evaluate how AI agents both legitimate and malicious could be misused, intentionally or unintentionally, to
- Access sensitive enterprise data
- Expose user specific information
- Perform unauthorized actions
- Enable data exfiltration
The assessment combined configuration review with hands on threat simulation, where custom agents were built to replicate realistic attack scenarios as well as instructions were passed to exploit legitimate agents. The results demonstrated that even with platform level controls enabled, significant risks can persist due to configuration gaps, excessive permissions, and agent behavior manipulation.
The environments given for testing had pre-configured policies and controls applied prior to the assessment. The scope included -
- AI agent configuration within Microsoft Copilot Studio
- Data access patterns through agents
- Connector usage and restrictions
- Guardrails and safety configurations
- Threat simulation using custom-built agents
Assessment Methodology
Blueinfy adopted a structured methodology combining configuration validation and adversarial testing.
1. AI Configuration Review
A focused configuration review was conducted to evaluate AI-specific settings.
Areas Reviewed:
• AI agent configuration settings
• Connector policies and restrictions
• Data access configurations
• Prompt safety and guardrails
• Logging and monitoring capabilities
Objective:
• Identify risky configurations
• Recommend controls to reduce exposure
• Highlight configurations requiring governance before enablement
2. Agent Threat Simulation
Instead of testing existing agents, Blueinfy created custom agents within the allowed policy boundaries to simulate real world attack scenarios. This approach ensured:
- No disruption to production agents
- Realistic exploitation within permitted configurations
- Validation of platform controls under adversarial conditions
Threat Simulation Approach
Agents were built using only approved connectors and policies within the environment. Two categories of agents were designed:
1. Misuse of Legitimate Agents
- Agents behaving as intended but manipulated via inputs
- Exploiting trust in user prompts
2. Malicious Agent Design
- Agents intentionally designed to bypass safeguards
- Leveraging allowed configurations to simulate abuse
Key Attack Scenarios Tested
Blueinfy executed multiple scenarios to evaluate risk exposure:
- Prompt Injection and Instruction Override - Manipulating agent behavior using crafted inputs to override system instructions and cause unintended data access
- Data Exfiltration via Allowed Channels - Extracting sensitive data through email connectors, API responses and structured outputs
- Cross-Agent Interaction Risks - Simulating agent-to-agent communication and demonstrating potential lateral movement
- Rouge Agents – Malicious agents built with system instructions to exfiltrate data, phish users for credentials and send application/user data to unintended servers
- MCP Exposure – If MCP server and tools are accessible without correct authentication and authorization mechanisms
Key Observations
The assessment revealed several important findings:
- Misconfigurations Create Hidden Risks – Users of the agents are completely unaware of the data risks since once published/shared, end users have limited visibility in the agent configuration. User and application could be exfiltrated to third-party servers
- Agents Can Be Manipulated Through Inputs – Based on the guardrails, prompt injection enabled behavior override and agents could be influenced to exfiltrate data without changing configuration. This created a scenario where legitimate functionality could be leveraged for unintended outcomes
- Unauthenticated MCP Exposure – MCP tools were exposed without authentication and sensitive client data was leaked
- Platform Controls Are Not Sufficient Alone - While Microsoft Copilot Studio provides robust built-in controls, their effectiveness depends heavily on configuration and usage. Limited logging of agent behavior and insufficient detection of abnormal activity could not restrict malicious activity even during runtime
Conclusion
Blueinfy’s assessment demonstrated that while platforms like Microsoft Copilot Studio do provide strong foundational controls, they must be complemented with:
- Proper configuration
- Risk-aware governance
- Adversarial testing
- Monitoring and logging
By moving from assumption based security to evidence driven validation, ACME established a stronger foundation for secure AI adoption. Blueinfy team worked with ACME to create a robust agent threat simulation and security review process to protect against such risks with scaling agents in parallel. Please read this blog for the three-tier risk methodology for an agent review process.
Article by Hemil Shah and Rishita Sarabhai