Six days. That's how long the US Commerce Department's AI safety testing page survived before it vanished. On May 5, 2026, the Center for AI Standards and Innovation, known as CAISI, announced new pre-deployment testing agreements with Google DeepMind, Microsoft, and xAI. The deals would give federal scientists access to frontier AI models before public release, allowing them to probe for security vulnerabilities, cybersecurity risks, and potential for military misuse. By May 11, the announcement page was gone without explanation. A search returns: "Sorry, we cannot find that page." No press release. No statement. No response from the Commerce Department or the White House to press inquiries. The most ambitious AI oversight program the US government had assembled simply disappeared from the internet.
What Actually Happened
The May 5 announcement was genuinely significant by the standards of recent US AI policy. CAISI, the successor to the US AI Safety Institute that the Trump administration renamed and restructured in early 2026, had secured agreements with Google DeepMind, Microsoft, and xAI to provide access to new frontier models before deployment. The stated purpose: identify weaknesses that could be exploited in cyberattacks or enable military misuse before the most powerful AI systems reached the public. The three companies joined Anthropic and OpenAI, which had struck similar arrangements roughly two years earlier under the prior administration's AI Safety Institute. For one brief moment, all five leading US AI labs were formally enrolled in federal pre-deployment review.
The mechanics of the testing mattered as much as the policy. According to reporting from multiple outlets, the models are evaluated inside classified government environments. More critically: developers sometimes supply versions with built-in safety guardrails deliberately disabled so testers can see what the underlying system can actually do without constraint. This is not standard product evaluation. This is access to the raw, unconstrained capability of the most advanced AI systems in the world, running inside a government facility, probed by scientists with clearances. The deletion of the announcement page on May 11 did not cancel those underlying agreements. But it removed the public accountability layer entirely, and that is a meaningful distinction.
Why This Matters More Than People Think
The deletion did not happen in isolation. It is the latest step in a systematic rollback of US AI safety infrastructure that has been progressing quietly since January 2026. The original US AI Safety Institute, created under the Biden administration with bipartisan support, was renamed to CAISI under executive order in February 2026. Its budget was reduced by roughly 40%. Several senior technical staff departed. The May 5 announcement appeared to signal a reversal: the new administration was willing to maintain pre-deployment testing, even under a different name. The May 11 deletion suggests that appearance was premature. Someone in the administration decided that even naming the companies publicly and describing the testing scope was too much transparency.
The timing creates a specific geopolitical problem. China's Ministry of Science and Technology has been expanding its AI oversight apparatus, not contracting it. Under regulations that took effect in August 2023 and were expanded in early 2026, Chinese AI companies are required to submit frontier models for government review before deployment. The reviews are opaque and serve different purposes, but they represent a government that is investing in understanding what its most powerful AI systems can do. The European Union's AI Act, which entered force in phases through 2026, requires safety testing and technical documentation for high-risk AI systems across all 27 member states. The US is now the only major AI-producing nation actively walking back its formal oversight capacity while simultaneously presiding over the most rapid proliferation of frontier AI capabilities in history.
The Regulatory Landscape
American AI policy in 2026 is a study in contradictions. Executive orders have simultaneously directed federal agencies to accelerate AI adoption, remove "barriers" to AI deployment, and prioritize "American AI leadership." The implicit policy logic: safety oversight creates friction, friction slows deployment, slow deployment cedes ground to China. By this reasoning, deleting the NIST testing page is a feature, not a bug. Speed is safety. The market will sort out the rest.
That logic has been challenged directly by two former NSA directors, three ex-CIA chiefs, and a bipartisan group of 34 senators who signed a letter in April 2026 warning that unreviewed frontier AI deployment poses national security risks that the market cannot self-regulate. The specific concern: AI systems with advanced reasoning and tool-use capabilities, deployed without security review, could be manipulated by state actors or used to accelerate cyberattack planning in ways that a standard product liability framework cannot address. The senators' letter specifically named CAISI's pre-deployment testing program as a minimum viable safeguard. That program's public visibility disappeared six weeks later.
Hidden Insight: The Guardrails-Off Clause Changes Everything
The most alarming detail in this story is buried in a single paragraph of technical reporting: models are tested with safety guardrails removed. This means CAISI scientists have had access to versions of GPT-5.5, Claude Opus 4.7, and Grok 4.3 that do not include the RLHF fine-tuning, constitutional AI constraints, or output filters that consumers interact with. They are seeing the base capability layer. If those results show dangerous capability overhang, that is information the public and Congress never see. The deletion of the announcement page ensures that even the existence of such findings would not prompt public scrutiny, because the public no longer has formal confirmation that the testing is ongoing.
There is a specific second-order effect that deserves attention. When the testing program was public, the companies had a reputational incentive to cooperate genuinely. If Google, Microsoft, or xAI was found to have gamed the testing process, the public announcement created accountability. Without public documentation of the agreements, that accountability disappears. A company can cooperate minimally, hand over a sanitized model version, and face no public consequence if the tests reveal nothing of interest. The government loses leverage, and the public loses the audit trail that would let independent researchers verify whether the testing is substantive.
Skeptics point out that NIST's pre-deployment testing program was always closer to security theater than genuine safety verification. The risk is real: a small team of government scientists, working in a classified environment with no specialized AI interpretability tools, evaluating a 1-trillion-parameter model they did not build, cannot realistically identify novel capability risks that the model's own developers missed. The labs have thousands of alignment researchers and billions of dollars in compute dedicated to understanding their systems. CAISI has a budget measured in tens of millions. Critics argue that the real AI safety work happens at Anthropic's model evaluation teams, at OpenAI's safety superalignment group, and at DeepMind's responsible development division, not in a government facility running standardized benchmarks. From this view, the deletion of the announcement page removes symbolism, not substance.
What to Watch Next
Watch for three signals over the next 90 days. First: whether any senator or representative formally requests the underlying testing agreements via the Freedom of Information Act or through committee subpoena. The deletion of the web page did not destroy the contracts. If someone in Congress wants to know whether Google handed over a guardrails-off Gemini model to CAISI scientists, they can compel that disclosure. The administration's response, cooperative or stonewalling, will reveal how much the deletion was procedural cleanup versus deliberate opacity. Second: any personnel changes at CAISI. When institutions are being wound down, director departures precede official announcements by 60 to 90 days. If CAISI's director or deputy director leaves before August 2026, that is a strong signal the program is being structurally defunded rather than temporarily deprioritized.
Third: the EU's response. The European Commission has been in active negotiations with US AI companies about mutual recognition of safety testing frameworks. If the US publicly withdraws from federal pre-deployment testing, the EU's negotiating position shifts: European regulators can no longer point to US government oversight as partial validation for market access decisions. This could accelerate EU requirements for independent third-party testing of US AI products, adding compliance costs and potentially bifurcating the global frontier AI market into US-standard and EU-standard deployment tracks. That regulatory divergence would cost the US AI industry more in market access than any testing program would have cost in development friction.
Deleting a web page doesn't cancel the danger the page was tracking. It just removes the obligation to tell anyone what you found.
Key Takeaways
- 6 days from announcement to deletion: The May 5 CAISI agreements with Google DeepMind, Microsoft, and xAI were publicly documented for less than a week before the page was silently removed
- Guardrails-off testing was the core mechanism: Models were evaluated with safety constraints removed, giving scientists access to raw base capabilities that consumers never see
- All 5 major US AI labs were enrolled: Adding Google, Microsoft, and xAI to existing Anthropic and OpenAI agreements created the first complete federal pre-deployment coverage of frontier AI, which vanished within the same week
- US is now the outlier on AI oversight: China, the EU, and the UK all maintain formal pre-deployment review mechanisms while the US has walked back its public commitment
- Accountability, not testing, was deleted: The underlying agreements may persist, but without public documentation, companies face no reputational consequences for minimal cooperation
Questions Worth Asking
- If CAISI scientists found dangerous capability overhang in a guardrails-off model test, what is the current legal mechanism for that finding to reach Congress or the public?
- Does removing public documentation of AI safety testing make the testing more rigorous (less gaming by companies) or less rigorous (less accountability for the testers themselves)?
- If the US and EU develop incompatible AI oversight standards over the next two years, which market will frontier AI labs optimize for, and what does that mean for deployment decisions in sectors like healthcare, finance, and defense?