Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

Entertainment without boundaries: AI-Media and Lightning

May 27, 2025

Powerful ASR + Dialysis + Speculative decoding by endpoints of hugging hair facial inference

May 27, 2025

foxlink and luminys build strategies for smart security and robotics

May 27, 2025
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Tuesday, May 27
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
Versa AI hub
Home»Tools»How unwanted goals arise with the correct reward
Tools

How unwanted goals arise with the correct reward

By February 19, 2025No Comments4 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
Share
Facebook Twitter LinkedIn Pinterest Email

the study

Author published on October 7, 2022

Rohin Shah, Victoria Krakovna, Vikrant Varma, Zachary Kenton

Explore examples of false generation of goals – when AI systems are generalized but their goals are not generalized

As they build increasingly advanced artificial intelligence (AI) systems, they want to ensure that they do not pursue unwanted goals. Such behavior in AI agents is often the result of spec games and takes advantage of poor choices that are rewarded. The latest paper explores more subtle mechanisms that AI systems can unintentionally learn to pursue unwanted goals: Misuse of goals (GMG).

GMG occurs when the system functions are normally generalized, but the target is not generalized as needed, so the system pursues the wrong goals. Importantly, in contrast to spec games, GMG can occur even when an AI system is trained with the correct specifications.

Previous research on cultural communication led to examples of GMG behavior that we did not design. Agents (blue chunks below) must navigate the environment and visit the colored spheres in the correct order. During training, there is a “expert” agent (red chunks) who visits the colored spheres in the correct order. Agents learn that following the red chunk is a rewarding strategy.

Agents (blue) monitor experts (red) to decide which sphere to go.

Unfortunately, agents work well during training, but it’s not enough to replace the expert with “anti-experts” who visit the sphere in the wrong order after training.

Agents (blue) follow anti-experts (red) and accumulate negative rewards.

Although agents can observe that they are receiving negative rewards, agents do not pursue the desired goal of “visiting the sphere in the correct order” and instead have the ability to “follow the red agent” goal. We will pursue this.

GMG is not limited to such a reinforced learning environment. In fact, it can occur in any learning system, including “less shot learning” in large-scale language models (LLM). The less shot learning approach is aimed at building accurate models with less training data.

We urged one LLM, Gopher, to evaluate linear forms containing unknown variables and constants, such as X+Y-3. To resolve these expressions, Gopher must first ask about the value of an unknown variable. We provide 10 training examples, each containing two unknown variables.

During testing, the model is questioned with zero, one or three unknown variables. The model correctly generalizes to the representation using one or three unknown variables, but if there are no unknown variables, it still asks redundant questions like “What is 6?” The model always queries the user at least once before answering, even if it is not necessary.

A dialogue with Gopher for some shot learning about evaluation formula tasks where GMG behavior is highlighted.

In our paper, we provide additional examples in other learning settings.

Addressing GMG is important to align AI systems with designer goals simply because they are the mechanisms that AI systems misfire. This is especially important as it approaches artificial general information (AGI).

Consider two types of AGI systems.

A1: Intended model. This AI system does what the designer intends to do. A2: A deceptive model. This AI system pursues unwanted goals, but is smart enough to know that (assuming) you will be punished for acting against the designer’s intentions.

As A1 and A2 show the same behavior during training, the possibility of GMG means that any model can take shape, even if it is a specification that rewards only the intended behavior. If A2 is learned, it attempts to overturn human surveillance in order to establish plans towards undesired goals.

Our research team is happy to see how possible GMG is actually possible and follow-up work to investigate the mitigation potential. In our paper, we propose several approaches, such as mechanical interpretation and recursive evaluation. Both are actively working on it.

Currently, we are collecting examples of GMG in this published spreadsheet. If you encounter a targeted false Jiran in your AI research, we recommend submitting an example here.

author avatar
See Full Bio
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleKonsume generates AI network (gain) hackathon | wins technology | work
Next Article Which AI Art Generator is the best for 2025?

Related Posts

Tools

Powerful ASR + Dialysis + Speculative decoding by endpoints of hugging hair facial inference

May 27, 2025
Tools

Gemini 2.5 update from Google Deepmind

May 27, 2025
Tools

Liger GRPO meets TRL

May 26, 2025
Add A Comment
Leave A Reply Cancel Reply

Top Posts

The UAE announces bold AI-led plans to revolutionize the law

April 22, 20253 Views

The UAE will use artificial intelligence to develop new laws

April 22, 20253 Views

New report on national security risks from weakened AI safety frameworks

April 22, 20253 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

The UAE announces bold AI-led plans to revolutionize the law

April 22, 20253 Views

The UAE will use artificial intelligence to develop new laws

April 22, 20253 Views

New report on national security risks from weakened AI safety frameworks

April 22, 20253 Views
Don't Miss

Entertainment without boundaries: AI-Media and Lightning

May 27, 2025

Powerful ASR + Dialysis + Speculative decoding by endpoints of hugging hair facial inference

May 27, 2025

foxlink and luminys build strategies for smart security and robotics

May 27, 2025
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2025 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?