Best Practices for Data Enrichment

Building a responsible approach to data collection in AI partnerships

At Deepmind, our goal is to ensure that everything we do meets the highest standards of safety and ethics in line with operating principles. One of the most important places that this begins is how you collect data. Over the past 12 months, we have worked with the AI (PAI) partnership to carefully consider these challenges and jointly develop standardized best practices and processes for responsible human data collection.

Human data collection

Over three years ago, we have been working to protect human dignity, rights and rights, a governance group modeled after the academic institutional review board (IRB) as seen in hospitals and universities. The Behavioral Research Ethics Committee (HUBREC) has been created. Welfare of human participants involved in our research. The committee oversees behavioral research, including experiments with humans as its research subjects, including examining how humans interact with artificial intelligence (AI) systems in the decision-making process.

The AI community is increasingly involved in efforts that include “data enrichment” in addition to projects that include behavioral research. This is a task human-performed to train and validate machine learning models such as data labeling and model evaluation. Behavioral research often relies on voluntary participants to be studied, but data enrichment includes those who are paid for the complete task of improving AI models.

These types of tasks are usually carried out on crowdsourcing platforms, and often take into account wages, welfare, and equity for workers who may lack the guidance or governance system necessary to ensure adequate standards. raise relevant ethical considerations. As labs accelerate the development of more sophisticated models, reliance on data enrichment practices is likely to grow, plus the need for stronger guidance.

As part of our operational principles, we commit to contributing and contributing to best practices in the areas of AI safety and ethics, including equity and privacy, to avoid unintended consequences that create risk of harm. I will.

Best Practices

Following PAI’s recent whitepaper on responsible sourcing for data enrichment services, we worked together to develop practices and processes for data enrichment. This includes creating five steps that AI practitioners can follow. Improve working conditions for those involved in data enrichment tasks (see Pai’s Data Enrichment Guidelines for more information).

Choose the right payment model, make sure all workers are paid beyond the local living wage, and run the pilot before starting the data enrichment project. Identify the right workers for the desired task. Establish clear and regular communication mechanisms with workers.

Together, create the necessary policies and resources, collect multiple rounds of feedback from internal legal, data, security, ethics, and research teams in the process, pilot them in a small number of data collection projects, Later they will develop a wider organization.

These documents provide more clarity on how best to set up data enrichment tasks in DeepMind, and increase researchers’ confidence in designing and carrying out their research. This not only increased the efficiency of the approval and launch processes, but also, importantly, improved the experience of those involved in the data enrichment task.

More information on responsible data enrichment practices and how they were incorporated into existing processes is explained in a recent case study from PAI, implementing responsible data enrichment practices for AI developers. PAI also provides resources and support materials to help AI practitioners and organizations looking to develop similar processes.

I’m looking forward to it

These best practices support our work, but we rely solely on them to ensure that our projects meet the highest standards of welfare and safety of our participants or workers. It shouldn’t. Because each DeepMind project is different, there is a dedicated human data review process that allows you to continue to engage with your research team to identify and mitigate risks on a case-by-case basis.

The purpose of this work is to serve as a resource for other organizations interested in improving data enrichment sourcing practices. We hope this will lead to conversations that transcend sectors where we can further develop these guidelines and resources for our teams and partners. We also hope that through this collaboration, the AI community will continue to develop responsible data collection norms and spark broader debate on how to establish better industry standards.

Please see the details of the operating principles.

See Full Bio

What's Hot

Gemma 3N is fully available in the open source ecosystem!

Professor UAB builds user-friendly tools to find hidden AI security threats

Major AI Chatbot Parrot CCP Propaganda

Gemma 3N is fully available in the open source ecosystem!

Major AI Chatbot Parrot CCP Propaganda

Introducing Chatbot Guardrails Arena

New Star: Discover why 보니 is the future of AI art

BitMart Research: MCP+AI Agent – A new framework for AI

How to build an MCP server with Gradio

Most Popular

New Star: Discover why 보니 is the future of AI art

BitMart Research: MCP+AI Agent – A new framework for AI

How to build an MCP server with Gradio

Don't Miss

Gemma 3N is fully available in the open source ecosystem!

Professor UAB builds user-friendly tools to find hidden AI security threats

Major AI Chatbot Parrot CCP Propaganda

Subscribe to Updates

What's Hot

Best Practices for Data Enrichment

Human data collection

Best Practices

I’m looking forward to it

Related Posts