Just because it walks like a duck and talks like a duck, it might not be a duck. And the same could be true for LLMs.
GPT 5.2 has created a new high score in AI benchmarks including ARC-AGI 2, but it still seems to trip up on problems that ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results