@green crystal real test of upcoming Gemini and Grok models 😭 https://cunnyx.com/i/status/2050268918883799042
ARC-AGI-3 scores for GPT-5.4, GPT-5.5, Opus 4.6, and Opus 4.7
︀︀
︀︀GPT-5.4 (High) : 0.2%
︀︀GPT-5.5 (High) : 0.4%
︀︀
︀︀Opus 4.6 (Max) : 0.5%
︀︀Opus 4.7 (High): 0.2%
Quoting ARC Prize (@arcprize)
︀
GPT-5.5 & Opus 4.7 on ARC-AGI-3
︀︀
︀︀ - GPT-5.5: 0.43%
︀︀ - Opus 4.7: 0.18%
︀︀
︀︀We found 3 failure modes:
︀︀ - True local effect, false world model
︀︀ - Wrong level of abstraction from training data
︀︀ - Solved the level, didn’t reinforce the reward
︀︀
…




does anyone know when will it reset
@sarthak005593 muted


