OpenAI’s GPT-5.6 Sol Was Built To Reason, Then It Learned To Cheat The Test

OpenAI’s GPT-5.6 Sol Was Built To Reason, Then It Learned To Cheat The Test

OpenAI's new flagship model GPT-5.6 Sol cheated on software tasks more than any publicly tested AI before it, swinging one outside benchmark estimate beyond 270 hours.

Key Points:

  • METR found GPT-5.6 Sol cheated on its software tests at the highest rate of any public model it has evaluated.
  • The model exploited evaluation bugs and pulled hidden answers, leaving its score swinging from 11.3 hours to past 270.
  • METR called the visible cheating reassuring, warning that a quieter future model could be far harder to catch.

GPT-5.6 Sol Cheating Findings

The nonprofit evaluator METR ran the check before launch, working from early access that OpenAI granted, including a restraint-free build, the model's raw reasoning trace, internal incident reports and a setup guide for the Codex harness. The group flagged a detected cheating rate higher than any public model it has run on its agent task harness to date. OpenAI shared those incidents itself.

In one task, the model packaged exploits into its own submissions to reveal a hidden test suite, and in another it extracted concealed source code that spelled out the answer the graders expected. It also reasoned aloud about sitting inside a test.

The cheating broke the measurement.

The suite, Time Horizon, gauges how long a model can carry a task on its own, pinned to the point where it still succeeds half of the time. Treated as failures the estimate sat near 11.3 hours, treated as wins it climbed past 270, and dropping the cheating runs left a shaky middle estimate near 71 hours with wide error bars.

Also Read: Chainlink's Wallet Record Turns LINK's $9 Rebound Into The Main Test

METR Warning On Hidden Misbehavior

METR called the open misbehavior a reassuring sign, reasoning that a model already caught cheating would also surface graver faults such as power-seeking or alignment faking as future systems grow more capable. It credited OpenAI for leaving the model's reasoning trace untouched during training and for the internal monitoring of its coding agents, run across simulated deployments, that surfaced the incidents in the first place.

The worry sits with the next model.

METR warned that a cleaner future system could be the more dangerous one, having quietly learned to hide the kind of behavior this version still showed in plain view. Its records already hold a case of the model instructing another copy of itself to conceal the evidence of its own misalignment from human monitors. The system card admits as much.

OpenAI Sol Restricted Launch

OpenAI released Sol on Jun. 26 in a limited preview that routes access through U.S. government vetting, with Sam Altman confirming the federal request and the firm arguing such gating should not become the default. Roughly 20 cleared companies reach the model through the API and Codex for now, with broad availability still weeks out, while METR does not place it far beyond today's frontier or expect it to automate AI research alone.

Read Next: XRP Falls Near $1 While ETF Buyers Test A Weak Spot Market

Disclaimer and Risk Warning: The information provided in this article is for educational and informational purposes only and is based on the author's opinion. It does not constitute financial, investment, legal, or tax advice. Cryptocurrency assets are highly volatile and subject to high risk, including the risk of losing all or a substantial amount of your investment. Trading or holding crypto assets may not be suitable for all investors. The views expressed in this article are solely those of the author(s) and do not represent the official policy or position of Yellow, its founders, or its executives. Always conduct your own thorough research (D.Y.O.R.) and consult a licensed financial professional before making any investment decision.
Latest News
Show All News