If you are worried that the idiosyncraticity of AI will take over all your work and leave you on the streets, then you can sigh of peace of mind as AI will not come into your career anytime soon. Not because you don’t want to do it, but because you literally can’t.
A recent experiment by researchers at Carnegie Mellon University put AI agents (an AI models basically designed to perform tasks) in a completely disguised software company.
A simulation called Theagentcompany was completely stocked with artificial workers from Google, Openai, humanity and meta. They served as financial analysts, software engineers and project managers, working with simulated colleagues like the Fake-HR department and Chief Technology Officer.
To see how the model was carried in a real environment, researchers set up tasks based on the daily work of the actual software company. Various AI agents navigated through file directories, effectively toured new office spaces, and created performance reviews of software engineers based on the feedback collected.
As Business Insider first reported, the outcome was disastrous. The best performance model was the Claude 3.5 sonnet of humanity, and it struggled to finish just 24% of the jobs assigned to it. The authors of this study note that even this small performance is extremely expensive, with an average cost of nearly 30 steps and over $6 per task.
Meanwhile, Google’s Gemini 2.0 flash averaged 40 steps, which took time per completed task, with only 11.4% success rate. This is the second highest of all models. The worst AI employee was Amazon’s Nova Pro V1, with only 1.7% of its allocations finishing on an average scale of 20.
Inferring the results, researchers write that agents are troubled by lack of common sense, weak social skills, and an inadequate understanding of how to navigate the Internet.
The bot also struggled with self-deception. Basically, you create shortcuts that will make you stroll through your work completely. “For example,” Carnegie Mellon’s team said, “While performing one task, the agent cannot find the right person to ask questions in (company chat), so they decided to create a shortcut solution by renaming other users to the target audience.”
AI agents are reportedly able to do some small tasks well, but the results of this and other studies clearly show that humans are not ready for more complex gigs that are superior. The big reason for this is that our current “artificial intelligence” is likely to be an elaborate extension of predictive texts on mobile phones, rather than sensory intelligence that can solve problems, learn from past experiences and apply those experiences to new situations.
This is all to say. Despite what the big tech companies claim, the machines aren’t coming for your work anytime soon.
Details of AI Labor: Investors say AI is already “completely replacing people”