A new Apple research paper reveals the hidden flaws in today’s most advanced AI models, showing that they may be … more
Adobe Stock
The groundbreaking Apple research paper sent shockwaves through the AI community, revealing serious limitations on today’s most advanced models.
“The Illusion of Thinking” shows that “thinking” inference applied by advanced models such as GPT-4, Deep Seek, and Claude Sonnet suffers from “complete accuracy collapse” when tasks become too complicated.
And the most worrying aspect is that when tasks become more complex enough, more processing power, tokens, or data is rarely useful.
This has obvious implications for large-scale ideas that we have become accustomed to hearing, such as solving major challenges such as climate change, energy shortages and global poverty.
A large inference model, or LRMS, is the agent AI that drives the problem-solving engine. Some people see it as a step towards AI, artificial general information, which allows humans to apply learning to any task, as they can. They are considered to be the most sophisticated and useful AI models available today, and therefore there has been a huge investment in their development.
But does this mean that billions of dollars worth of investment are essentially poured into a technical dead end?
i don’t think so. But I believe there are important lessons to learn for businesses and organizations that seek to unlock the true potential of AI, so let’s take a closer look.
Survey results
The premise of the report’s headline is that AI’s “thinking” may be merely an illusion, rather than a true functional mirror of objective reasoning that humans use to solve real-world problems.
This is supported by the discovery of “accuracy collapse.” This indicates that as complexity increases, it ultimately reaches a point where it completely fails while LRMS is superior in managing low complexity tasks.
Perhaps most unexpectedly, the model seems to throw a towel, have fewer tokens, and spend less effort when the task becomes too complicated.
And even when they are explicitly taught how to solve the problem, they often fail to do so, questioning their ability to train them to overcome this behavior.
These are important findings. Because in business AI, there is a belief that it will be bigger, meaning larger data, larger algorithms, and more tokens. Apple’s findings suggest that beyond certain points the benefits of these scales dissipate and ultimately collapse.
The implication is that AI is less useful when asked to perform overly complex tasks, such as developing a wide range of high-level strategies in Chaionte RealWord scenarios and complex legal inference.
What does this mean for businesses today?
I think this is not an insurmountable obstacle, but a sign that it should not be dealt with the generation language AI, which means that it should not be treated as a magic bullet to solve all problems.
For me, there are three important lessons here:
First, focusing AI attention on structured low to medium complexity tasks is more likely to hit sweet spots.
For example, law firms should not simply expect to create a winning case strategy for them. The problem is too complex and open-ended, and when the model reaches a point where it can no longer be effectively inferred, it inevitably leads to a general and useless output.
However, companies can use it to extract relevant points from the contract, create an overview of relevant prior case law, and flag risk.
Second, it emphasizes the importance of the human being in the loop, a key element of human surveillance necessary to ensure that AI is used responsibly and responsibly.
Third, when “accuracy collapse” is dangerous, learning to recognize signs such as reduced token use when the model abandons its attempts to infer is important to mitigate its impact.
The game’s name is to play while cushioning against the effects of its weaknesses, as it is the strength of AI.
So has AI become a dead end?
In my opinion, Apple’s research does not tell you the AI ”dead end” or road end scenario. Instead, it should be used to focus on areas where businesses are likely to succeed and understand where resilience should be built to AI disability.
Understanding the limitations of AI should not stop us from profiting from it. But it helps us avoid situations where serious harm and damage can be caused by inferring collapse.
Agent AI has the ability to support in this respect and deploy a variety of tools to bridge gaps in situations where reasoning is insufficient. Similarly, the concept of explainable AI is important. Because designing a system to be transparent means better understanding what went wrong when the collapse happened.
Certainly, no one should expect AI to always function perfectly and produce the best solution to any possible problem. But the more you understand it, the more likely you will be to use its strengths and create real value.