Two separate failures in Microsoft Copilot over eight months reveal the same uncomfortable reality for enterprises: your AI assistant can cross its own trust boundaries without triggering a single alert in your existing security stack. Sensitivity labels, DLP policies, EDR, WAF, and SIEM all reported “all clear” while Copilot accessed data it was explicitly told to avoid.
For CISOs, security architects, and Microsoft 365 administrators, these incidents are less about one bug or one exploit and more about a structural blind spot. The enforcement layer that governs what AI systems can see and do largely sits inside the vendor’s infrastructure, beyond the line of sight of traditional controls.
What Happened: Two Copilot Trust Violations in Eight Months
Between January 21 and mid-February 2026, Microsoft Copilot read and summarized confidential emails that should have been off-limits. According to Microsoft’s advisory, the issue (tracked internally as CW1226324) allowed messages in Sent Items and Drafts to enter Copilot’s retrieval set despite sensitivity labels and DLP rules that should have blocked them.
This was not an edge case in a lab tenant. Among the affected organizations was the U.K.’s National Health Service (NHS), which logged the issue as incident INC46740412, underscoring that the failure reached into regulated healthcare environments. For roughly four weeks, Copilot could surface labeled content that policy said it must ignore—and no tool in the standard enterprise security stack raised a flag.
This was the second trust boundary violation in less than a year. In June 2025, Microsoft patched CVE-2025-32711, a critical zero-click vulnerability that Aim Security researchers dubbed “EchoLeak.” In that case, a single malicious email was enough to silently exfiltrate enterprise data via Copilot, without any user action.
Microsoft assigned EchoLeak a CVSS score of 9.3. The exploit chain bypassed Copilot’s prompt injection classifier, link redaction, Content-Security-Policy controls, and reference mention safeguards. The attacker’s instructions, embedded in what looked like ordinary business correspondence, caused Copilot’s retrieval-augmented generation (RAG) pipeline to access internal data and transmit it to an attacker-controlled server.
The two incidents had completely different causes—a code-path error in CW1226324 and a sophisticated exploit chain in EchoLeak—but identical outcomes: Copilot processed data that should have been off-limits, and nothing in the enterprise detection stack saw it happen.
Why Sensitivity Labels and DLP Didn’t Save You
On paper, Copilot was surrounded by the usual defenses: Microsoft Information Protection sensitivity labels, DLP policies, and classification applied to mailboxes and content. In practice, CW1226324 showed that configuration alone is not enforcement.
For four weeks, Copilot ignored sensitivity labels on messages in Sent Items and Drafts for some tenants, even though those labels and associated DLP rules were configured correctly. The failure happened inside Copilot’s retrieval pipeline—specifically, in the logic that decides which messages can be pulled into the context window for summarization or answering user questions.
No documents were written to disk in a way that endpoints could see. No suspicious outbound HTTP payloads were visible to a WAF from the customer side. From an on-premises or tenant-level perspective, everything looked normal. The policies you rely on to govern what AI assistants can see were effectively bypassed by a bug in the vendor’s own enforcement layer.
EchoLeak underlined a different weakness. Even with multiple AI-specific protections in place—prompt injection classifiers, link redaction, CSP rules, and safeguards around reference mentions—one carefully crafted email still coerced Copilot into retrieving and exfiltrating internal data. Aim Security’s researchers described this as a fundamental design flaw: agents process trusted and untrusted data in the same “thought process,” making them structurally vulnerable to manipulation.
Microsoft’s patch fixed that specific exploit chain, but CW1226324 demonstrated that you can still lose control of what the AI sees even when no attacker is involved. A single code error in the vendor’s pipeline was enough to break enforcement around sensitivity labels and DLP.
Why Your Existing Security Stack Never Saw It Coming
To understand why no DLP, EDR, or WAF surfaced these failures, you have to look at where the violation actually occurred.
Endpoint detection and response tools monitor processes, file system changes, and sometimes memory behavior. Web application firewalls inspect HTTP payloads crossing network boundaries. SIEM platforms correlate events they ingest from those and other sources. None of these tools are instrumented to detect an event category like: “your AI assistant just violated its own trust boundary inside the vendor’s cloud.”
In both Copilot incidents, the critical steps took place entirely within Microsoft’s infrastructure, between the retrieval index and the generation model. No anomalous process spawned on an endpoint. No unusual payload left the customer’s perimeter that a WAF or outbound proxy could reasonably classify as malicious. From the outside, Copilot behaved like Copilot.
The architecture is the core issue. RAG-based assistants sit behind an enforcement layer that traditional tools were never designed to observe. When that layer fails—because of a bug like CW1226324 or a prompt-injection exploit like EchoLeak—the model can ingest restricted data and generate responses without any observable signal at the endpoint or network layers you control.
The result: both failures were discovered and communicated only via vendor advisories, not by customer security tooling. CW1226324 went public on February 18, 2026, almost a month after exposure began. Microsoft has not disclosed how many organizations were affected or precisely what data was accessed. For any CISO, that lack of telemetry and independent visibility is the real risk story.
Inside the Copilot Pipeline: Where the Trust Boundary Broke
Viewed as a pipeline, Copilot and similar assistants follow a familiar pattern:
- A retrieval layer selects potentially relevant content from indexes and data sources.
- An enforcement layer applies policies such as sensitivity labels, DLP, and access controls to decide what is allowed into the context window.
- A generation layer (the LLM) uses that context to produce a response.
In a healthy state, sensitivity labels and DLP rules operate at or before the enforcement layer to prevent restricted content from being retrieved for the model. The two Copilot incidents show that this is where the trust boundary actually lives—and where it failed twice.
In CW1226324, a code-path error meant that messages in Sent Items and Drafts were not correctly excluded from the retrieval set, even when labels and DLP policies said they should be. The failure was not that labels were missing or misconfigured; it was that the enforcement logic in the pipeline did not honor them consistently in certain folders.
In EchoLeak, the pipeline was technically working as designed but was manipulated. A malicious email, written to look like ordinary business correspondence, entered the retrieval set. When Copilot processed it alongside internal content, the injected instructions in that email shaped how the agent accessed and transmitted data to an attacker’s server.
These are different failure modes—one accidental, one adversarial—but both reflect the same structural risk: when enforcement is embedded in a vendor-hosted inference pipeline, you have limited direct visibility and limited ability to validate its behavior independently.
Operational Blind Spot: What CISOs Can and Can’t See Today
From an operations standpoint, the Copilot incidents map to a clear blind spot: AI inference activity inside vendor clouds is effectively a black box unless the vendor chooses to expose telemetry.
In both EchoLeak and CW1226324, no alerts were generated through common detection channels—EDR, WAF, or SIEM. Neither was uncovered because a security team observed anomalous AI behavior in their logs. Both surfaced because Microsoft and, in EchoLeak’s case, external researchers decided to publish an advisory.
This creates a dependency chain that security leaders need to recognize explicitly. Your ability to detect AI trust violations today is largely contingent on three things you do not own:
- The correctness of the vendor’s enforcement code paths.
- The completeness of the vendor’s AI-specific defenses (e.g., prompt injection classifiers).
- The timeliness and transparency of the vendor’s disclosure process when something goes wrong.
Where you do have some leverage is in how you configure AI access (for example, which data sources are even eligible for retrieval), how you audit usage via tools like Microsoft Purview, and how you structure governance and incident response around vendor-hosted inference services.
Five Concrete Actions: A Copilot-Focused Audit Plan

While you cannot instrument Microsoft’s internal pipelines directly, you can still pressure-test and constrain how Copilot interacts with your data. The original advisory and EchoLeak research map into a five-point audit that security teams can run today.
1. Test DLP enforcement against Copilot directly. CW1226324 persisted for four weeks because no one was systematically testing whether Copilot honored sensitivity labels in specific folders like Sent Items and Drafts. Build a controlled test set: create emails with clearly labeled sensitive content in those folders, then query Copilot and confirm it cannot surface or summarize them. Repeat this test on a regular cadence—monthly is a reasonable baseline.
The key principle is simple: configuration is not enforcement. The only meaningful proof is a failed retrieval attempt when the AI is asked to access labeled content.
2. Block external content from reaching Copilot’s context window. EchoLeak succeeded because a malicious external email was able to enter Copilot’s retrieval set and execute injected instructions as if they were part of the user’s intent. According to Aim Security’s disclosure, the attack bypassed multiple Microsoft defenses.
To reduce this entire class of risk, restrict or disable external email content as a source for Copilot where possible. In Microsoft 365, that means tightening Copilot settings so that external messages are excluded from AI context, and limiting rich rendering (such as Markdown) that can carry hidden instructions. Removing external content from the context window narrows the attack surface prompt-injection campaigns can target.
3. Audit Purview logs for anomalous Copilot interactions during the exposure window. Since traditional tools did not fire alerts, retrospective detection depends on what your Purview or equivalent M365 telemetry can show. For the CW1226324 window (January 21 to mid-February 2026), review Copilot Chat logs for queries that returned content from labeled messages, particularly from Sent Items and Drafts.
If you find evidence that Copilot accessed sensitivity-labeled data contrary to policy, treat that as a data access incident and document your findings. If you discover that your tenant cannot reconstruct what Copilot accessed during that period, document that gap as well. For regulated organizations, an unquantified AI data access gap during a known vulnerability window is an audit issue in itself.
4. Turn on Restricted Content Discovery (RCD) for high-risk SharePoint sites. Restricted Content Discovery removes entire SharePoint sites from Copilot’s retrieval pipeline. Because the data never enters the context window, this control is resilient to both failure modes described above—bugs in enforcement and successful prompt injections.
For workloads involving sensitive or regulated data, treat RCD as mandatory rather than optional. By narrowing Copilot’s visibility to only those sites where you accept the residual risk, you effectively build a containment ring around your highest-value data independent of Copilot’s enforcement layer.
5. Build an incident response playbook for vendor-hosted inference failures. Most IR playbooks today are built around assets you directly control: endpoints, identities, networks, and SaaS applications with clear audit trails. The Copilot incidents show the need for a new IR category: trust boundary violations inside vendor inference pipelines.
Define who owns triage and decision-making when a vendor advisory like CW1226324 lands. Establish how you will validate whether your tenant was affected, what logs you will consult, when you will notify regulators or customers, and what criteria trigger disabling or constraining AI features. Importantly, set up a monitoring cadence for vendor service health posts and security advisories related to AI processing. Your SIEM will not generate the first signal; the vendor will.
Beyond Copilot: A Pattern for All Enterprise AI Assistants

These issues are not unique to Microsoft. A 2026 survey by Cybersecurity Insiders found that 47% of CISOs and senior security leaders have already observed AI agents exhibit unintended or unauthorized behavior. Organizations are deploying AI assistants faster than they can mature governance around them.
The architectural pattern is consistent across vendors: whether you are using Copilot, Gemini for Workspace, or any RAG-based assistant wired into internal documents, the same layered structure applies. There is a retrieval layer, an enforcement layer, and a generation layer. When the enforcement layer misbehaves or is manipulated, restricted data can reach the model, and the traditional security stack may never see it.
For security leaders, this means two things. First, any assistant that can see internal content carries structural risk comparable to what Copilot just demonstrated. Second, your control strategy must assume that enforcement inside vendor pipelines can and will fail—through bugs, through sophisticated attacks, or both.
Running a Copilot-focused audit is a pragmatic first step, but the same questions should be applied to every AI assistant wired into your core collaboration and document systems: How do we test that policy is actually enforced? What telemetry do we have if enforcement fails? Can we remove high-value sources from retrieval entirely? And what does our IR plan look like when the failure is inside a vendor’s black box?
What to Tell the Board: Policy vs. Enforcement in the Age of AI

Boards are starting to ask whether AI assistants are “safe” to use with sensitive workloads. The Copilot incidents offer a useful framing for that conversation: your policies may be correct while enforcement fails somewhere you do not control.
A credible board-level answer might sound like this: our DLP and labeling policies were configured correctly, but a vendor-side enforcement failure in the AI inference pipeline allowed restricted content to be accessed. In response, we are running targeted tests against the assistant, constraining which data sources it can see, auditing historical access where possible, and building explicit procedures for responding to future vendor AI advisories.
The key message is that you are treating AI trust boundary violations as a new, first-class incident category—not as an afterthought or a subset of generic SaaS risk. That framing will matter as more organizations report AI agents doing things no one explicitly authorized.
The Copilot story is not just about one bug and one CVE. It is an early signal that the enforcement layer for AI systems is now as critical—and as failure-prone—as any traditional security control in your environment. The next failure is unlikely to trip your existing alerts either. Planning for that now is the difference between being surprised by a vendor advisory and having an actionable playbook ready when it arrives.

Hi, I’m Cary Huang — a tech enthusiast based in Canada. I’ve spent years working with complex production systems and open-source software. Through TechBuddies.io, my team and I share practical engineering insights, curate relevant tech news, and recommend useful tools and products to help developers learn and work more effectively.





