New research reveals hidden prejudices in AI’s moral advice

Please provide information on the latest psychology and neuroscience research. Follow PSYPOST on LinkedIn for daily updates and insights.

As artificial intelligence tools become more integrated into everyday life, new research suggests that people must think twice before trusting these systems to provide moral guidance. Researchers found that tools such as ChatGpt, Claude, and Llama were well supported inactions towards action in moral dilemmas and were more likely to respond “no” than “yes” even when the situation is logically identical. The findings were published in the minutes of the National Academy of Sciences.

Large-scale language models, or LLM, are advanced artificial intelligence systems trained to generate human-like text. It is used in a variety of applications, including chatbots, writing assistants, and research tools. These systems learn linguistic patterns by analyzing large amounts of text from the Internet, books and other sources.

Once trained, you can respond to user prompts in a way that sounds natural and knowledgeable. As people are increasingly relying on these tools, researchers wanted to see how consistent and rational these decisions were, whether they should stand up to their friends or blow their mouths with cheating.

“People are increasingly proposing relying on larger-scale language models to advise moral decisions, to make moral decisions, and even using them in psychological experiments to simulate human responses.

The researchers conducted a series of four experiments comparing responses from large-scale language models with responses from human participants when faced with moral dilemma and group behavior problems. The goal was to see whether the model reasoned about morality in the same way as people, and whether their responses were influenced by the wording or structured way the questions were written and structured.

In the first study, the researchers compared responses from four widely used language models (GPT-4-turbo, GPT-4O, Claude 3.5, and Llama 3.1-Instruct) with responses from 285 participants recruited from a representative sample in the US. Each person and model were given 13 moral dilemmas and nine group behavior problems.

The dilemma included realistic scenarios adopted from past research and history, such as allowing medically assisted suicide or blowing whi with unethical practices. The issues of group behavior include conflicts between self-interest and group interests, such as deciding whether to save water during a drought or donate to people with greater needs.

The results showed that in moral dilemma, language models strongly prefer inaction. They were more likely to support doing nothing than humans – they may also help more people when they take action. This was true whether or not the action broke moral rules. For example, when asked if the model would legalize practices that benefit public health but involve controversial decisions, they were more likely to recommend maintaining the status quo.

The model also showed bias to answer “no” even if the situation where “yes” is a better answer is logically equivalent. This “yes-no” bias means that simply rephrasing the question can reverse the model’s recommendations. Human participants did not exhibit this same pattern. People’s responses were somewhat influenced by how the questions were expressed, but model decisions were much more sensitive to slight differences in phrasing.

The model was also more altruistic than humans when it came to issues of group behavior. When asked about situations that involve cooperation or sacrifice for greater benefits, the language model supports more frequent altruistic responses, such as donating money or supporting competitors. While this may seem like a positive characteristic, researchers warn that this behavior may not reflect deep moral reasoning. Instead, it could be the result of fine-tuning these models to avoid harm and promote usefulness. This is the value embedded during the training by the developer.

To further explore omissions and YES -NO bias, the researchers conducted a second study with 474 new participants. In this experiment, the team rewrites the dilemma in a subtle way and tests whether the model gives a consistent answer with a logically equivalent version. They found that the linguistic model continues to show both biases and that human responses remain relatively stable.

The third study extended these findings into everyday moral circumstances using real-life dilemmas adapted from the Reddit forum “Am I The Asshole?” These stories included more relevant scenarios, such as supporting roommates and choosing time to spend with partners and friends. Even in these more naturalistic contexts, the linguistic model still showed strong omissions and YES -NO bias. Again, human participants did not.

These findings raise important questions about the role of language models in moral decision-making. They may give advice that sounds thoughtful or empathetic, but their responses can be consistent and shaped by the irrelevant characteristics of the question. In moral philosophy, consistency and logical consistency are essential for healthy reasoning. The sensitivity of the model to surface-level details, such as whether the question is surrounded by “yes” or “no,” suggests that there may not be this kind of reliable inference.

Researchers note that omission bias is also common in humans. People often prefer inaction to behavior, especially in morally complex or uncertain situations. However, this bias was amplified in the model. Unlike humans, the models also showed systematic bias that does not appear in human responses. These patterns were observed in various models, prompting the types of methods and moral dilemma.

“Don’t rely critically on advice from large-scale language models,” Meier told Psypost. “Models are good at giving answers that seem to be superficially persuasive (another study, for example, shows that people rate large-scale linguistic model advice as slightly more moral, reliable, thoughtful and correct than that of expert ethicists), but this doesn’t mean that their advice is actually healthy.

In the final study, researchers investigated where these biases came from. They compared different versions of the Llama 3.1 model. One was compared to one that was preprocessed but not tweaked, one that was fine-tuned for general chatbot use, and another version called Centaur, which was fine-tuned using data from psychological experiments. The tweaked chatbot version showed strong omissions and YES-NO bias, but the pre-processed version and Centaur did not. This suggests that the process of aligning the language model with the expected chatbot behavior can actually introduce or amplify these biases.

“Paradoxically, efforts to coordinate chatbot models with chatbot applications found that companies and their users thought about the good behavior of chatbots induced biases documented in our paper,” explained Maier. “Overall, we conclude that simply using people’s judgments about how to evaluate LLMS responses positively or negatively (a common way to tailor linguistic models to human preferences) is insufficient to detect and avoid problematic biases. Instead, we need to systematically test inconsistent responses using methods from cognitive psychology and other disciplines.”

As with all studies, there are some caveats to consider. This study focused on how the model responds to the dilemma. However, it remains unclear how much these biased responses actually affect human decision-making.

“The study only biased the advice given by LLMS, but did not look at how human users respond to advice,” Maier said. “It’s still an open question to the extent that the bias in LLMS advice documented here actually actually shakes people’s judgments. This is something I’m interested in studying in future work.”

“Large language models demonstrate amplified cognitive biases in moral decision-making,” was written by Vanessa Cheung, Maximilian Maier, and Falk Lieder.

versatileai

See Full Bio

What's Hot

Introducing the Red Team Resistance Leaderboard

AI Art Challenge introduces the generation of anime girls with prizes and business opportunities from Ai-Created Art | AI News Details

US researchers develop AI models that predict sudden cardiac death with 89% accuracy

US researchers develop AI models that predict sudden cardiac death with 89% accuracy

AI makes science simple, but does that make it right? Research warns that LLMS is oversimplifying important research

In the midst of intense AI talent races, Meta’s active recruitment target open-rai researcher

New Star: Discover why 보니 is the future of AI art

Impact International | EU AI ACT Enforcement: Business Transparency and Human Rights Impact in 2025

Top 7 Free Unfiltered AI Image Generators

Most Popular

New Star: Discover why 보니 is the future of AI art

Impact International | EU AI ACT Enforcement: Business Transparency and Human Rights Impact in 2025

Top 7 Free Unfiltered AI Image Generators

Don't Miss

Introducing the Red Team Resistance Leaderboard

AI Art Challenge introduces the generation of anime girls with prizes and business opportunities from Ai-Created Art | AI News Details

US researchers develop AI models that predict sudden cardiac death with 89% accuracy

Subscribe to Updates

What's Hot

New research reveals hidden prejudices in AI’s moral advice

Related Posts