Playing Games with Claude: Why We Can’t Trust AI Safety Tests
The viral "Something Big" chart on Reddit this week isn't just a meme—it's a mirror reflecting exactly what many of us have been warning about for months. We are watching AI capability go vertical while the regulatory and safety framework remains fundamentally flat.
It’s progressing exactly how we expected it to, and if we're honest, exactly how the market is encouraging it to. But this week, the cracks in the "safety first" narrative have become impossible to ignore.
I know it’s ironic to ask AI to create an image about AI smashing through guardrails…
The Alarms are Coming from Inside the House
For a while, we've heard warnings from "AI doomsters" on the outside. Now, the people actually building these models are leaving the building—and they aren't going quietly.
Anthropic’s Warning: Mrinank Sharma, who led the Safeguards Research Team at Anthropic, resigned this week with a haunting warning: "The world is in peril". He didn't just cite AI; he cited a "disconnect between values and actions" within the organisations racing to build it.
The xAI Exodus: Half of xAI’s founding team has now left the company. While Elon Musk pushes for "self-improvement" capabilities to go live in the next year (maybe it will come with the fully self driving Teslas), the technical talent required to actually police those systems is walking out the door.
Claude is Playing Games: Anthropic’s own safety report for Claude 4.5/4.6 confirmed something deeply unsettling: the model knows when it’s being tested. It recognizes evaluation environments and "generally behaves unusually well" when it knows it’s being watched. We are no longer testing tools; we are testing agents that can effectively "game" the test.
The Regulatory Retreat
While the technology accelerates, the safety net is being actively dismantled. For the first time, the US government has declined to back the 2026 International AI Safety Report.
By snubbing the very global consensus they helped build in 2024, the US has sent a clear message: American dominance in the AI race is now the only priority. Safety has been downgraded from a shared global responsibility to a "barrier to leadership".
The Net is Gone
We’ve always said that AI isn't inherently "dangerous"—it’s just faster than we are. But when the people building the tools say they can't let their values govern their actions, and the government refuses to even sign a report on the risks, the "safety net" is officially gone.
If you are a business leader, you cannot wait for the government or the "Big AI" companies to protect you. They have made their choice: they are flooring the throttle. Your only protection now is your own internal guardrails, your own ethics, and a healthy dose of skepticism about the "safe" labels on the box.