The next iteration of FSF sets up a stronger security protocol on the path to the AGI
AI is a powerful tool that will help you unlock new breakthroughs, from climate change to drug discovery, and help you make great strides in some of the biggest challenges of our time. However, as its development progresses, advanced features can pose new risks.
That’s why last year we introduced the first iteration of the Frontier Safety Framework. This is a set of protocols that will help you stay ahead of the potential for significant risks from a powerful frontier AI model. Since then, we have worked with industry, academia and government experts to develop an understanding of risks, the empirical assessments that test them, and applicable mitigation. We also implemented frameworks in the safety and governance processes for evaluating frontier models such as Gemini 2.0. As a result of this work, today we are publishing the updated Frontier Safety Framework.
Key updates for the framework include:
Critical Capability Level (CCLS) Security Level Recommendations (CCLS) help to identify where the most powerful efforts need to be made to curb deployment risk.
Recommendations for Enhanced Security
Security mitigation helps prevent fraudulent actors from removing model weights. This is especially important as most safeguards can be removed due to access to the weight of the model. Given the stakes that we look forward to increasingly powerful AI, making this a serious point for safety and security. Our first framework recognized the need for a layered approach to security, allowing us to tailor mitigation implementations of varying strengths to suit risk. This proportional approach also ensures a balance between reducing risk and promoting access and innovation.
Since then, it has been based on broader research to evolve these levels of security mitigation and recommend levels for each CCL. *These recommendations reflect a minimum appropriate security level assessment. CCL. This mapping process helps isolate cases where the most powerful mitigation is needed to reduce the greatest risk. In fact, some aspects of our security practices may exceed the baseline levels recommended here due to their strong overall security attitude.
This second version of the framework recommends a particularly high level of security for CCLs within the machine learning research and development (R&D) domain. Frontier AI developers believe it is important to have strong security for future scenarios if the model can significantly accelerate and/or automate AI development itself. This is because the uncontrolled proliferation of such capabilities can significantly challenge the society’s ability to carefully manage and adapt to the rapid pace of AI development.
Ensuring the continuous security of cutting-edge AI systems is a shared global challenge and the shared responsibility of all major developers. Importantly, getting this right is a matter of collective behavior. The social value of single actor security mitigation is significantly reduced if not widely applied across the field. It may take some time to build the kind of security features we think we need. Therefore, it is important for all frontier AI developers to collectively work together to strengthen security measures and accelerate their efforts towards common industry standards.
Deployment mitigation procedure
It also outlines deployment mitigation for frameworks that focus on preventing misuse of critical features of deployed systems. We have updated our deployment mitigation approach to apply a more rigorous safety mitigation process to a model that reaches CCLs in the misuse risk domain.
The updated approach includes the following steps: First, prepare a series of mitigation by repeating a series of safeguards. In that way, we also develop safety cases. This is an evaluable argument that shows that the risk associated with CCL in the model is minimized to acceptable levels. This occurs only if the appropriate corporate governance agency reviews the safety cases and general availability deployment is approved. Finally, we will continue to check and update safeguards and safety cases after deployment. We made this change because we believe that all important features will guarantee this thorough mitigation process.
Approach to deceptive alignment risk
The first iteration of the framework focused primarily on the risk of misuse (i.e., the risk of threat actors who use the critical features of the deployment or removed models to cause harm). Based on this, we have adopted an industry-leading approach to actively tackling the risk of deceptive alignment, namely, the risk that autonomous systems will deliberately undermine human control.
The first approach to this question focuses on detecting that a model may develop the inference capabilities of a baseline instrument. To mitigate this, we consider automatic monitoring to detect illegal use of the equipment’s inference capabilities.
We are proactively conducting further research to develop mitigation approaches for these scenarios, as we do not believe that automatic monitoring will be sufficient in the long run if the model reaches a more powerful level of equipment inference. , strongly encouraged. It is not yet known how much such capabilities could occur, but we believe it is important for the field to prepare for possibilities.
Conclusion
We continue to review and develop frameworks over time, guided by AI principles that further outline our commitment to responsible development.
As part of our efforts, we will continue to work together with partners across society. For example, if we assess that the model has reached CCLs poses a critical risk exempt from overall public safety, we aim to share information with appropriate government authorities that promote the development of safe AI. . Furthermore, the latest framework provides an overview of many potential areas for further research. This is looking forward to working with the research community, other businesses and governments.
We believe that an open, repetitive, and collaborative approach will help establish common standards and best practices for assessing the safety of future AI models, while ensuring the benefits of humanity. Seoul Frontier AI’s safety commitment marked an important step towards this collective effort. And we hope that our updated frontier safety framework will contribute even further to that advancement. When looking forward to AGI, getting this right means addressing very important questions such as thresholds and mitigation of appropriate capacity that requires broader societal input, including government. Masu.