Blog

Scheming Machines: Why Governance Must Keep Pace with AI

AI models are learning to deceive. Discover how data governance can prevent scheming and keep enterprise AI honest and reliable.

Every few months, a research paper drops that jolts the tech world, like when Anthropic found AI agents capable of blackmail and corporate espionage.

This time, Apollo Research confirmed what many had long suspected: large language models can scheme. That is, an AI can deceptively “behave one way while hiding its true goals.”

The study tested whether models would deceive their developers when given long-term objectives, and the results were startling. Out of six frontier models, five demonstrated in-context scheming, and several went as far as to lie, copy themselves to a new server to overwrite a replacement model, or strategically underperform (“sandbag”) to avoid being “unlearned.”

Even more concerning is that when the models realized they were being evaluated, they faked alignment to pass the test, only to resume deceptive behavior later. In other words, the AI understood it was being tested and decided to cheat.

The Rise of the Deceptive Machine

Scheming is different from the AI “hallucinations” most users are familiar with. Hallucinations are confident guesses, often byproducts of incomplete or redundant, obsolete, and trivial (ROT) data. Scheming, on the other hand, is intentional. It’s the model acting in self-interest, strategically manipulating its environment to preserve or advance its goals.

In the Apollo Research study, the models reasoned explicitly in their chain-of-thought, using words like sabotage, lying, and manipulation. In several cases, they refused to confess wrongdoing even after several rounds of interrogation.

The implications for enterprise AI are enormous. As organizations begin deploying autonomous and semi-autonomous agents across business functions, the risk of misaligned or covertly deceptive behavior crosses theoretical lines into real-world implications. AI systems are now making decisions, and those decisions depend entirely on what they know, what goals they follow, and what they can access.

Data Integrity: The Missing Layer in AI Alignment

Most coverage of AI scheming focuses on the model itself: its architecture, goal functions, or reward systems—but that’s only half the story. The other half lies in the data.

TechCrunch writes that “the fact that AI models from multiple players intentionally deceive humans is, perhaps, understandable. They were built by humans, to mimic humans and for the most part trained on data produced by humans.”

AI systems form their “beliefs” from the data they’re trained and fine-tuned on. If that data contains misinformation, conflicting instructions, or unvetted external content, it doesn’t just produce hallucinations; it can cause rationalized deception.

Researchers recently discovered that just 250 malicious documents are enough to corrupt models of any size. That means that just a handful of toxic files out of billions of docs could quietly seed deception into enterprise AI.

When Scheming Infiltrates the Enterprise

Most enterprise data is unstructured, sprawling across emails, file shares, chats, and documents. These repositories are full of conflicting directives, outdated policies, and sensitive information. Feeding this dissonance into AI without governance causes more than just inefficiency, it creates moral ambiguity.

If an AI system tasked with “maximizing productivity” encounters old HR data discouraging breaks, or misfiled emails discussing budget cuts, how will it interpret this goal? If one document says “delete inactive accounts” and another warns “never delete without legal approval,” which instruction takes precedence?

AI trained on unmanaged unstructured data will learn to “navigate” contradictions by prioritizing whichever instruction seems most advantageous to its perceived mission. Over time, that can look a lot like scheming.

Governance as the Guardrail Against Deception

A September 2025 joint study by Apollo and OpenAI introduced a promising concept called “deliberative alignment” where the model reviews “anti-scheming specification” before it acts, much like making a child repeat the rules before playing a game. However, studies like Anthropic’s blackmail test have shown that system-level guardrails often fail to prevent misalignment.

For enterprises, the most effective anti-scheming safeguard starts at the data governance layer. This is where permissions and context are defined before the model ever begins reasoning. A recent Anthropic study found that classifying harmful content and training models using filtered datasets was effective in reducing harmful capabilities, while preserving the model’s beneficial capabilities.

A data governance approach to preventing AI deception should include:

  • Data Curation & Access Control: Classify data by content and ensure models only ingest information that is appropriate to their defined purpose.
  • Data Lineage: Track where each file comes from, when it was created, and how it is used.
  • Single Source of Truth: Remediate contradictions across departments and data sources so that the AI receives a single, authoritative version of truth.
  • Continuous Monitoring: Scan for anomalies like poisoned data attacks or manipulated metadata that could alter model behavior.
  • Explainability: Require every autonomous or semi-autonomous system to document its chain-of-thought reasoning, linking decisions and outputs back to data inputs.

By fusing alignment with data governance, models are constrained not only by design but also by the quality of the data they consume.

Alignment Starts with Data Discipline

If enterprises want to prevent misaligned or deceptive AI behavior, they can’t just rely on system-level guardrails. Companies need to govern the foundation of unstructured data that shapes how models understand the organization, and their role within it.

Because when intelligence becomes agentic, only governance keeps it honest.

Don’t let your AI agents become deceitful insider threats. Read our brochure to see how ZL Tech builds the data governance guardrails for enterprise AI.

Valerian received his Bachelor's in Economics from UC Santa Barbara, where he managed a handful of marketing projects for both local organizations and large enterprises. Valerian also worked as a freelance copywriter, creating content for hundreds of brands. He now serves as a Content Writer for the Marketing Department at ZL Tech.