Claude Opus 4.8 Tops The Intelligence Index Yet Mythos Dominates Hacking

Anthropic released its newest model, Claude Opus 4.8, this week with a slim lead on an intelligence benchmark, yet it trails the firm's restricted Mythos system on writing software exploits.

Key Points:

Claude Opus 4.8 narrowly tops the Artificial Analysis Intelligence Index at 61.4, just ahead of GPT-5.5 at 60.2.

In Anthropic's internal tests, Mythos produced working Firefox exploits on 70.8% of targets, against 8.8% for Opus 4.8.

Mythos stays limited to vetted Project Glasswing partners, while Opus 4.8 ships at the same price as its predecessor.

Opus 4.8 Benchmark Lead

The company rolled out Opus 4.8 this week and priced it at $5 per million input tokens and $25 per million output, holding the rate level with the prior Opus 4.7.

Independent testers report the model now leads the Artificial Analysis Intelligence Index at 61.4, an aggregate of ten evaluations, just ahead of GPT-5.5 at 60.2. Anthropic casts the upgrade as a modest, incremental step rather than the generational leap its naming might suggest.

On agentic coding, Opus 4.8 scores 69.2% on SWE-bench Pro, a benchmark that asks a model to fix real bugs inside large code repositories, while GPT-5.5 reaches 58.6%.

The two systems run nearly even on graduate-level science questions, both landing close to 94%, and Opus 4.8 narrowly leads a broad reasoning exam its predecessors trailed.

Mythos sits above both on the hardest engineering work, posting 77.8% on that same coding benchmark and a wider lead on tasks that mix code with screenshots. Anthropic restricts Mythos to a vetted set of partners under its Project Glasswing program, rather than selling it openly. It charges $25 and $125 per million tokens for the preview, five times the Opus rate.

Also Read: Zcash Cools After A 6% Drop While Monero Steals The Spotlight

Mythos Cyber Dominance

The widest gap shows up in offensive security.

With safeguards switched off, Mythos produced a full working exploit on 70.8% of Firefox targets in Anthropic's own evaluations, while Opus 4.8 cleared just 8.8%.

On a separate test drawn from open-source code, Opus 4.8 failed to score on 61.5% of targets, more than double the 23.3% miss rate posted by Mythos.

A public cross-model trial run by Berkeley RDI paired each system with its own coding agent across 898 real-world vulnerabilities, where Mythos wrote 157 working exploits to GPT-5.5's 120.

GPT-5.5 still held an edge on kernel-level exploitation, leading Mythos 22 to 12 on that narrow slice. The UK AI Security Institute placed it slightly ahead of Mythos on expert cyber tasks, at 71.4% to 68.6%.

Anthropic unveiled Mythos in April after the model found thousands of previously unknown flaws across major operating systems and every leading web browser, with hundreds reported in Firefox alone. The company then withheld it from public release, wary that the same exploit-writing skills could aid attackers as readily as the defenders it was built to help.