The Illusion of Thinking & the Honesty of Apple’s AI Researchers
Some engineers at Apple recently came out with an interesting paper titled "The Illusion of Thinking:
Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity."
Available here:
https://ml-site.cdn-apple.com/papers/the-illusion-of-thinking.pdf
The article digs into the limits of today’s so-called “Large Reasoning Models” (LRMs), or fancy versions of LLMs that write out detailed chains of thought before answering. The authors find that while LRMs “demonstrate improved performance on reasoning benchmarks,” they also hit a sharp “accuracy collapse beyond certain complexities” and even start to think less as problems get harder (p. 3). In other words, the more complicated the puzzle, the shorter and less effective the model’s reasoning becomes. It’s an impressively candid admission from Apple’s own AI research team. There’s a famous line that says it’s “hard to get a person to believe something when their salary depends on not believing it,” but in this case, the engineers seem to be laying the limitations bare. It's almost ironically transparent for an industry built on hype.
Maybe the title "The Illusion of Thinking" cuts deeper than intended. After all, human thinking might also be more illusion than logic when you zoom out far enough. The researchers note that models like Claude 3.7 or DeepSeek-R1 can “overthink” simple problems, chasing wrong paths even after finding the right one (p. 9). That sounds familiar: humans do that all the time. We rationalize, we spiral, we get fixated. Perhaps our advantage isn’t that we’re truly logical, but that our illusions of coherence are more convincing. As the paper puts it, these models “fail to develop generalizable reasoning capabilities beyond certain complexity thresholds” (p. 10). So do we, arguably, but we tell the story better, and that narrative ability might be the deeper illusion that keeps civilization humming.
From a practical standpoint, this paper nudges AI developers toward humility and maybe modular design. If “reasoning effort decreases with problem complexity” (p. 8), then the solution might not be to force a single model to do all the thinking. Instead, we can structure AI workflows so that no individual LLM call needs to bear the full weight of logic. Break problems up, use smaller reasoning steps, and let orchestration, not brute thought, drive success. In a way, that’s what humans do too: we don’t reason perfectly; we collaborate, compartmentalize, and delegate. Maybe building better AI isn’t about teaching machines to think more, but to share the illusion more effectively.

Comments
Post a Comment