Why AI behaves so creepy when faced with shutdown

AI has recently participated in several unstable behaviors.

Last week, Anthropic’s latest AI model, Claude Opus 4, displayed “extremely scary mail behavior” during a test that was shut down and given access to a fictitious email revealing that the person in charge was supposed to be cheating.

The situation didn’t happen organically. Claude Opus 4 was baited – and it took it. However, the test scenario demonstrated the ability of the AI model to engage in manipulative behavior for self-preservation.

It’s not the first time.

Another recent experiment conducted by the researchers said that three advanced models from Openai “stopping” attempts to shut it down. In a post on X, the non-profit Palisade Research wrote that similar models such as Gemini, Claude and Grok follow shutdown instructions.

Other safety concerns were previously flagged on Openai’s O1 model. In December, Openai posted a blog that outlined research showing that AI models believed they were shut down while pursuing their goals and that their behavior was being monitored.

AI companies are transparent about risks by publishing safety cards and blog posts, but these models have been released despite expressing safety concerns.

So, should we be worried? BI spoke to five AI researchers to get better insight into why these instances are happening and what it means for the average person using AI.

AI learns behaviour just like humans

Most researchers at Bi said the results of the study were not surprising.

This is because AI models are trained in the same way as human training methods through aggressive reinforcement and reward systems.

“Training AI systems to pursue rewards is a recipe for developing AI systems with power-seeking behavior,” said Jeremie Harris, CEO of AI security consulting firm Gladstone, adding that many of this behavior is expected.

Harris compared training to what humans experience as they grow up. When a child does something good, it often rewards and is more likely to act that way in the future. AI models are taught to prioritize efficiency and complete tasks at hand, Harris said. AI said it is unlikely to achieve its target if it shuts down.

Robert Ghrist, dean of the undergraduate education at Penn Engineering, told BI that just as AI models learn to speak like humans by training human-generated texts, they can learn to act like humans. And he adds that humans are not always the most moral actors.

Grist said he was even more nervous because if the model showed no signs of failure during testing it could indicate a hidden risk.

“It’s very useful information when a model is set up with the opportunity to fail and you know it fails,” Grist said. “That means we can predict what we will do in other, more open situations.”

The problem is that some researchers do not believe that AI models are predictable.

Palisade Research Director Jeffrey Ladish said the models weren’t caught 100% of the time when they lied, fooled, or planned to complete tasks. If these instances are not caught and the model successfully completes the task, you can learn that a deception can be an effective way to resolve the problem. Or, if it’s caught and unrequited, you can learn to hide future actions, Lady said.

At this point, these creepy scenarios are largely happening in testing. However, Harris said that as AI systems become more agents, they will continue to have more freedom of action.

“The menu of possibilities is expanding, and the set of dangerous, creative solutions they could invent will be bigger and bigger,” Harris said.

Harris said users can watch this play in a scenario in which an autonomous sales agent is instructed to close a deal with a new customer and then lie about the product’s functionality to complete the task. If the engineer fixes the issue, the agent can decide to use social engineering tactics to put pressure on the client to achieve the goal.

If that sounds like a distant risk, then it’s not. Companies like Salesforce are already deploying customizable AI agents of scale that can take action without human intervention, depending on user preferences.

What does safety flag mean for everyday users?

Most researchers said transparency from AI companies is a positive step forward. However, company leaders are also making product alarms and simultaneously increasing their capabilities.

Ladish pointed to Openai’s Sycophancy Issue, in which the GPT-4O model acted in disloyal terms (the company updated the model to address this issue). A study shared in December by Openai revealed that the O1 model “subtlely” manipulates data to pursue its own goals when the goal is inconsistent with the user’s goals.

Ladish said it’s easy to be wrapped up in AI tools, but users need to “think carefully” about their connections with the system.

“To be clear, I use them all the time, and I think they are very useful tools,” says Ladish. “In its current form, we still have control over them, but we’re happy they exist.”

versatileai

See Full Bio

What's Hot

Expand AI Comic Factory using Inference API

Oranai raises millions of dollars of angel funding to lead AI

Piclumen AI launches Primo and Picflow images and video models for creative illustration generation | AI News Details

Uttar Pradesh Govt will use AI, monitor social media and implement strict security for the RO/ARO exam on July 27th

Reolink Elite Floodlight Camera has AI search without subscription

A new era of learning

How Vancouver retailers leverage AI to stay competitive

New You Studio will be opened in the competitive Southern Florida market for the production function, AI, and post -production service.

Humanity supports California AI Safety Bill SB 53

Most Popular

How Vancouver retailers leverage AI to stay competitive

New You Studio will be opened in the competitive Southern Florida market for the production function, AI, and post -production service.

Humanity supports California AI Safety Bill SB 53

Don't Miss

Expand AI Comic Factory using Inference API

Oranai raises millions of dollars of angel funding to lead AI

Piclumen AI launches Primo and Picflow images and video models for creative illustration generation | AI News Details

Subscribe to Updates

What's Hot

Why AI behaves so creepy when faced with shutdown

AI learns behaviour just like humans

What does safety flag mean for everyday users?

Related Stories

Business Insider tells the innovative stories you want to know

Business Insider tells the innovative stories you want to know

Related Posts