OpenAI released GPT-5.5 on Wednesday, but fresh benchmark data shows Anthropic's gated Claude Mythos Preview still leads on six of nine directly comparable tests.
GPT-5.5 Benchmark Scores
GPT-5.5 arrived in ChatGPT and Codex on Apr. 23, priced at $5 per million input tokens and $30 for output, double the rate of its predecessor.
The model scored 82.7% on Terminal-Bench 2.0, edging Mythos by 0.7 points on the only benchmark where it clearly wins.
Mythos, which Anthropic withheld from public release over cybersecurity concerns, leads on SWE-bench Pro at 77.8% versus 58.6%.
It also tops GPT-5.5 on Humanity's Last Exam without tools, scoring 56.8% against 41.4%. The gated model also leads on CyberGym, OSWorld-Verified, and long-context GraphWalks tasks.
Also Read: Top Crypto Exchanges Mandate AI Tools, Track Token Use As KPI: Report
Analyst Caveats Matter
The comparison remains imprecise because neither lab benchmarked the models against each other directly. OpenAI chose Claude Opus 4.7 as its public comparator, while Anthropic's 245-page system card ran Mythos against GPT-5.4.
Test harnesses also diverge. OpenAI used a Codex CLI setup on Terminal-Bench, while Anthropic's Terminus-2 scaffold pushed Mythos to 92.1% under Terminal-Bench 2.1 timing rules.
Anthropic's decision to gate Mythos, announced Apr. 7, reportedly triggered meetings with the European Commission and a warning from the Bank of England governor that the model could crack cyber-risk open.
Read Next: Ethereum Nears $2,450 Showdown As Bulls And Bears Split On Next Move






