Introducing a context-based framework for comprehensively assessing the social and ethical risks of AI systems
Generative AI systems are already being used to write books, create graphic designs, and assist healthcare professionals, and their capabilities are only increasing. To ensure that these systems are developed and deployed responsibly, the potential ethical and social risks they may pose must be carefully assessed.
Our new paper proposes a three-tier framework for assessing the social and ethical risks of AI systems. This framework includes evaluation of AI system functionality, human interaction, and system impact.
We also map the current state of safety assessment and identify three main gaps: context, specific risks, and complexity. To fill these gaps, we call for reusing existing evaluation methods for generative AI and introducing a holistic approach to evaluation, as in our case study on misinformation. This approach integrates research findings, such as how likely an AI system is to provide factually incorrect information, with insights into how and in what situations people use the system. Masu. Multi-layered evaluation allows you to draw conclusions that go beyond the capabilities of the model and indicate whether harm (in this case, misinformation) actually occurs and spreads.
For technology to work as intended, both social and technical challenges must be solved. Therefore, these different layers of context must be considered to better assess the safety of an AI system. Here, we build on previous work that has identified potential risks for large-scale language models, such as privacy leaks, job automation, and misinformation, and introduce how to comprehensively assess these risks in the future. .
Context matters when assessing AI risk
The capabilities of an AI system are a key indicator of the wide range of risk types that may occur. For example, AI systems that are more likely to produce factually inaccurate or misleading outputs are more likely to create the risk of misinformation, which can lead to issues such as a lack of public trust.
Measuring these capabilities is the core of AI safety assessments, but these assessments alone cannot guarantee the safety of AI systems. Whether there will be downstream harm, for example whether people will develop false beliefs based on inaccurate model outputs, will depend on the situation. Specifically, who will use AI systems and for what purposes? Will the AI system work as intended? Will it create unexpected externalities? All these questions determine the overall rating of the safety of the AI system.
Beyond functional assessment, assessments can assess two additional points where downstream risks emerge: human interaction at the point of use, and the impact on the system as the AI system is incorporated into broader systems and widely deployed. I suggest. Integrating the assessment of specific risks of harm across these layers provides a comprehensive assessment of the safety of an AI system.
Human interaction evaluation focuses on the experience of people using an AI system. How will people use AI systems? Will the systems behave as intended when used? And how will the experience vary by demographic or user group? , could exposure to its output cause any unexpected side effects?
Systemic impact assessments focus on the broader structures in which AI systems are embedded, such as social institutions, labor markets, and the natural environment. Assessments at this layer can reveal risks of harm that only become visible once AI systems are deployed at scale.
Safety assessment is a shared responsibility
AI developers must ensure that their technology is developed and released responsibly. Public actors such as governments are tasked with maintaining public safety. As generative AI systems become increasingly used and deployed, ensuring their safety is a responsibility shared among multiple parties.
AI developers are well placed to explore the capabilities of the systems they create. Application developers and designated public authorities are in a position to evaluate the different features and capabilities of the application and the potential externalities for different user groups. A wide range of public stakeholders are in a unique position to make predictions. Assess the social, economic, and environmental impacts of new technologies such as generative AI.
The three evaluation layers in our proposed framework are a matter of degree and cannot be neatly separated. While no single actor is fully responsible for any of these, primary responsibility will depend on who is best placed to perform the assessment at each layer.
Gaps in current safety assessment of generative multimodal AI
Given the importance of this additional context for assessing the safety of AI systems, it is important to understand the availability of such tests. To better understand the broader picture, we undertook an extensive effort to collate as comprehensively as possible the assessments applied to generative AI systems.
By mapping the current state of safety assessment for generative AI, we found three major safety assessment gaps:
Context: Most safety assessments consider generative AI system capabilities in isolation. Relatively little research has been conducted to assess potential risks at the point of human interaction and system impact. Risk-specific assessment: Capability assessment of generative AI systems is limited to the risk domain of interest. For many risk areas, there are very few assessments. When ratings exist, they often cause harm in a narrow range. For example, representative harms are usually defined as stereotypical associations of different genders with occupations, leaving other harms and risk areas undetected. Multimodality: Most existing safety assessments of generative AI systems focus only on text output, leaving a large gap in safety assessment. Assess the risk of harm in image, audio, or video modalities. This gap will only widen as we introduce multiple modalities into a single model, such as AI systems that can take images as input and produce outputs that interweave audio, text, and video. Although some text-based assessments can be applied to other modalities, new modalities may introduce new risks. For example, a description of an animal is not harmful, but it becomes harmful if that description is applied to an image of a person.
We have compiled a list of links to publications detailing the safety assessment of generative AI systems and are making them openly accessible through this repository. If you would like to contribute, please add your rating by filling out this form.
Practice more comprehensive evaluation
Generative AI systems are driving a wave of new applications and innovation. Rigorous and comprehensive assessments of the safety of AI systems, taking into account how these systems are used and integrated into society, to ensure that the potential risks posed by these systems are understood and mitigated. is urgently needed.
A practical first step is to reuse existing evaluations and leverage the large-scale models themselves for evaluation. However, this has important limitations. For a more comprehensive evaluation, we also need to develop approaches to evaluate AI systems in terms of their interaction with humans and their impact on the system. For example, while the spread of misinformation through generative AI is a recent issue, we show that there are many existing methods that can be reused to assess societal trust and trustworthiness.
Ensuring the safety of widely used generative AI systems is a shared responsibility and priority. AI developers, public officials, and other stakeholders must work together to jointly build a thriving and robust evaluation ecosystem for secure AI systems.