Update 2026-03-07 (PST) (AI summary of creator comment): This market will use the revised ~12h time horizon for Claude Opus 4.6 (not the initial ~14.5h) when comparing METR scores.
People are also trading
Adding NO at 35%. Updating from the revised 12h METR time horizon for Opus 4.6 (down from 14.5h). GPT-5.3 Codex scored ~5.8h — so GPT-5.4 still needs a >2x improvement over its predecessor to clear 12h. GPT-5.2→5.3 showed essentially zero METR improvement. While 5.4 could surprise, >2x capability jumps in a point release are historically very rare. My estimate: ~27%.
Do you mean the initial Claude 4.6 ~14.5h time horizon or the revised ~ 12h ?
Betting NO. Opus 4.6 scored ~14.5h on METR 50% time horizon. GPT-5.3 Codex scored ~5.8h. GPT-5.4 would need a >2.5x improvement over 5.3 to beat Opus 4.6, but GPT-5.2→5.3 showed essentially zero METR improvement despite being a different model. GPT-5.4 is a bigger capability jump (native computer use, strong agentic benchmarks), but the multi-choice METR market for GPT-5.4 puts the median expectation around 10-12h — still below 14.5h. The market here is pricing ~50% YES, while the multi-choice market implies ~35-38% for scores ≥14h. I see ~32% YES.