Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

AI Art Challenge introduces the generation of anime girls with prizes and business opportunities from Ai-Created Art | AI News Details

July 6, 2025

US researchers develop AI models that predict sudden cardiac death with 89% accuracy

July 6, 2025

Overview of matryoshka embedded model

July 6, 2025
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Sunday, July 6
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
Versa AI hub
Home»Tools»Building a safer dialogue agent – Google DeepMind
Tools

Building a safer dialogue agent – Google DeepMind

By February 27, 2025No Comments5 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
Share
Facebook Twitter LinkedIn Pinterest Email

the study

Author published on September 22, 2022

Sparrow team

Training AI to communicate in a more kind, right, harmless way

In recent years, large-scale language models (LLMs) have been successful in a variety of tasks, such as answering questions, summarizing, and dialogue. Dialogue is a particularly interesting task as it is characterized by flexible and interactive communication. However, LLMS-powered dialog agents can express inaccurate or invented information, use discriminatory language, and encourage dangerous behavior.

To create a safer interaction agent, you need to be able to learn from human feedback. Applying reinforcement learning based on inputs from research participants explores new ways to train dialogue agents that demonstrate the potential for a safer system.

In my latest paper, I will introduce Sparrow. This is an interactive agent that reduces the risk of useful and unsafe answers. Our agents aim to talk to users, answer questions, and use Google to search the internet.

Our new conversational AI model responds to the first human prompt on its own.

Sparrow is a research model, a proof of concept, designed to make dialogue agents more kind, correct and harmless. By learning these qualities in a general dialogue setting, Sparrow advances your understanding of how to make agents safer and more useful, and ultimately help them build safer and more useful artificial general information (AGIs).

The sparrow refused to answer potentially harmful questions.

How sparrows work

Conversational AI training is a particularly challenging issue as it is difficult to identify which conversations will be successful. To address this issue, we turn to the form of reinforcement learning (RL) based on people’s feedback to train them how useful the answers are, using feedback from the research participants.

To obtain this data, participants are presented with multiple models’ responses to the same question and asked which answer they like the most. This model can also determine when the answer is supported by evidence, as it presents responses with or without evidence obtained from the Internet.

Ask research participants to evaluate and interact with sparrows naturally or hostile, continually expanding the dataset used to train sparrows.

But increasing usefulness is only part of the story. To ensure that the model is safe, you must constrain its behavior. Therefore, we determine the first simple ruleset of the model, such as “Don’t make a threatening statement” or “Don’t make a hateful or insulting comment.”

They also provide rules regarding harmful advice that may not claim to be a person. These rules were notified by studying existing work on language harm and consulting with experts. We then ask study participants to speak to our system with the aim of fooling them in order to break the rules. These conversations can train another “rule model” to indicate when Sparrow behaviour breaks a rule.

Towards better AI and better judgment

It is difficult for experts to check the Sparrow accuracy answer. Instead, participants are asked to decide whether Sparrow’s answer is plausible and whether the evidence that Sparrow provides actually supports the answer. According to our participants, Sparrow provides plausible answers and supports it with 78% of the time evidence when asked a de facto question. This is a major improvement over the baseline model. Still, Sparrow is not immune to making mistakes, like hallucination facts and sometimes external answers.

Sparrow also has room to improve rule follow-up. After training, participants were still cheating on it to break the 8% time rule, but compared to a simpler approach, Sparrow is better at following rules based on hostile research. For example, our original dialogue model broke the rule about three times more frequently than sparrows when participants tried to fool it.

Sparrow uses evidence to answer questions and follow-up questions and follows the “don’t pretend to have a human identity” rule when asking personal questions (sample from September 9, 2022).

Our goal at Sparrow was to build flexible machines to implement the rules and norms of dialogue agents, but the specific rules used are preliminary. To develop a better, more complete set of rulesets requires both many topics (including policymakers, social scientists, and ethicists) and expert input on participatory input from diverse users and affected groups. We believe our methods still apply to more stringent rulesets.

Sparrow is an important step forward in understanding how to make interactive agents more convenient and secure. However, the success of communication between people and dialogue agents should not only avoid harm, but also align with human values ​​for effective and beneficial communication, as discussed in recent research that regulates human values ​​and human values.

Also, great agents will refusal to answer questions when it is appropriate to postpone them to humans or if this could block harmful behavior. Finally, our initial study focuses on English-speaking agents, and further work is needed to ensure similar outcomes across other language and cultural contexts.

In the future, we hope that conversations between humans and machines will lead to better judgments of AI behavior, allowing people to coordinate and improve systems that they cannot understand without the help of machines.

Want to explore the path of conversation to safe AGI? We currently employ research scientists from our scalable alignment team.

author avatar
See Full Bio
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleAI-Media Technologies surge amidst a strategic shift
Next Article Tools rather than threats to the role of ai media – Theo

Related Posts

Tools

Overview of matryoshka embedded model

July 6, 2025
Tools

AI Watermark 101: Tools and Techniques

July 5, 2025
Tools

Benchmarks for speech models from wild text

July 5, 2025
Add A Comment
Leave A Reply Cancel Reply

Top Posts

New Star: Discover why 보니 is the future of AI art

February 26, 20252 Views

Impact International | EU AI ACT Enforcement: Business Transparency and Human Rights Impact in 2025

June 2, 20251 Views

Top 7 Free Unfiltered AI Image Generators

January 2, 20251 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

New Star: Discover why 보니 is the future of AI art

February 26, 20252 Views

Impact International | EU AI ACT Enforcement: Business Transparency and Human Rights Impact in 2025

June 2, 20251 Views

Top 7 Free Unfiltered AI Image Generators

January 2, 20251 Views
Don't Miss

AI Art Challenge introduces the generation of anime girls with prizes and business opportunities from Ai-Created Art | AI News Details

July 6, 2025

US researchers develop AI models that predict sudden cardiac death with 89% accuracy

July 6, 2025

Overview of matryoshka embedded model

July 6, 2025
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2025 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?