We are expanding our risk domain and improving our risk assessment process.
AI breakthroughs are changing our daily lives, from advances in mathematics, biology and astronomy to realizing the possibilities of personalized education. As we build more and more powerful AI models, we are committed to developing technology responsibly and taking an evidence-based approach to stay ahead of new risks.
Today we are publishing our third iteration of the Frontier Safety Framework (FSF). This is the most comprehensive approach to identifying and mitigating serious risks from advanced AI models.
This update is based on ongoing collaborations with industry, academia and government experts. It also incorporates lessons learned from previous version implementations and evolving best practices in frontier AI safety.
Important updates to the framework
Address the risk of harmful operations
This update introduces a critical functional level (CCL)* focusing on harmful operations. Specifically, we introduce AI models with powerful manipulation capabilities that can be misused to systematically and substantially change beliefs and behaviors in the context of high stakes identified throughout interaction with the model.
This addition is operated on the basis of research carried out to identify and evaluate mechanisms that facilitate operations from the generated AI. Going forward, we will continue to invest in this domain to better understand and measure the risks associated with harmful operations.
Adapt your approach to malicious risk
We also extended the framework to address potential future scenarios. This scenario can interfere with the operator’s ability to direct, modify, or shut down the operator.
Previous versions of the framework included an exploratory approach centered around Instrumental Reasoning CCLS (i.e., warning levels specific to when AI models begin to think deceptively), but this update has now provided additional protocols for machine learning research and development focusing on models that potentially destabilize the level of machine learning research and development.
In addition to the risk of misuse that arises from these features, there is also the possibility of models of undirected action at these functional levels, as well as the false risk that such models are likely to be integrated into the AI ​​development and deployment process.
To address the risks posed by CCLS, if the relevant CCL is reached, a safety case review will be conducted prior to external launch. This involves performing a detailed analysis showing how risk has been reduced to a manageable level. In the case of advanced machine learning R&D CCL, large internal deployments can also pose risks, so we have expanded this approach to include such deployments.
Reduce the risk assessment process
Our framework is designed to address risk proportionately to severity. We specifically refined the definition of CCL to identify key threats that ensure the most rigorous governance and mitigation strategies. We will continue to apply safety and security mitigation as part of our standard model development approach to reach certain CCL thresholds.
Finally, this update will cover the risk assessment process in detail. Based on the early avoidance assessment of the core, we discuss how a global assessment will be conducted, including systematic risk identification, a comprehensive analysis of model capabilities, and an explicit determination of risk tolerance.
Promote frontier’s commitment to safety
This latest update to the Frontier Safety Framework represents our continued commitment to tracking AI risks and taking a scientific and evidence-based approach to moving forward as our capabilities move towards AGI. By expanding the risk domain and strengthening the risk assessment process, we aim to ensure that transformative AI benefits humanity while minimizing potential harm.
Our framework continues to evolve based on new research, stakeholder input, and lessons learned from implementation. We remain committed to working together across industry, academia and government.
A path to useful AGIs requires not only technical breakthroughs but also a robust framework to mitigate risk along the way. We hope that the updated Frontier safety framework will contribute meaningfully to this collective effort.