Explore AGI frontiers and prioritize preparation, aggressive risk assessment, and collaboration with the broader AI community.
Artificial General Information (AGI), AI, as capable as humans on at least most cognitive tasks, could be here in the next few years.
Integrated with agent functionality, AGI can replenish AI to autonomously understand, infer, plan, and execute actions. These technological advances provide society with valuable tools to address important global challenges such as drug discovery, economic growth, and climate change.
This means you can expect concrete benefits from billions of people. For example, it could revolutionize healthcare by enabling faster and more accurate medical diagnosis. By providing a personalized learning experience, education can be made more accessible and engaging. By enhancing information processing, AGI helps reduce barriers to innovation and creativity. Democratizing access to advanced tools and knowledge allows small organizations to tackle complex challenges previously addressed by large, well-funded institutions.
Navigate the path to the AGI
I am optimistic about the possibility of AGI. It has the power to transform our world and serves as a catalyst for progress in many areas of life. But it is essential for such a powerful technology, and even small possibilities of harm must be taken seriously and prevented.
Mitigating the safety challenges of AGI requires proactive planning, preparation and collaboration. Previously, I introduced my approach to AGI in the “Level of AGI” framework paper. This provided a perspective that categorized the capabilities of advanced AI systems, understood and compared performance, assessed potential risks, and measured progress towards more general and capable AI.
Today, we share AGI’s views on safety and security as we navigate this path to transformational technology. This new paper, a technical approach to safety and security, is the starting point for important conversations with the wider industry about how to monitor AGI progress, ensuring it is developed safely and responsibly.
The paper details how we take a systematic and comprehensive approach to AGI safety, focusing deeply on misuse and misconsistency, and examines four major risk areas: misuse, misorganization, accidents and structural risk.
Understand and address possible misuses
Misuse occurs when humans intentionally use AI systems for harmful purposes.
Improved current harm and mitigation insights continue to increase understanding of serious long-term harms and how to prevent them.
For example, current misuse of generated AI includes creating harmful content and spreading inaccurate information. In the future, advanced AI systems may have the capacity to make a greater impact on public beliefs and behavior in ways that could lead to unintended social outcomes.
The potential severity of such harm requires aggressive safety and security measures.
As we will elaborate in the paper, a key component of our strategy is identifying and limiting access to dangerous features that may be misused, including those that allow cyberattacks.
We are investigating many mitigations to prevent the misuse of advanced AI. This includes sophisticated security mechanisms that can prevent malicious actors from gaining raw access to model weights that can bypass safety guardrails. A mitigation that limits the possibility of misuse when a model is deployed. Also, threat modeling studies that help identify capabilities thresholds when security needs to be increased. Additionally, the recently launched Cybersecurity Assessment Framework will further this work to help mitigate AI-driven threats.
Even today, we evaluate the most advanced models, such as Gemini, for potentially dangerous features before release. Frontier’s safety framework delves deep into ways to assess capabilities and adopt mitigation, including cybersecurity and biosecurity risks.
Impatience issues
For AGI to truly complement human abilities, it must be consistent with human values. Misalignment occurs when AI systems pursue goals that are different from human intentions.
We have previously shown how inconsistencies arise in examples of spec games where AI finds solutions to achieve goals, but not in the way humans intended to direct them, and not in the way they are.
For example, an AI system that is asked to book a ticket for a movie may decide to hack into the ticket system to get a seat that is already occupied.
It also implements extensive research into the risk of deceptive alignment: the risk of recognizing that AI systems do not meet human directions and deliberately attempting to deliberately circumvent human-introduced actions.
Countermeasures inconsistencies
Our goal is to have advanced AI systems trained to pursue the right goals. So, they follow human instructions accurately and prevent AI from using potentially unethical shortcuts to achieve their goals.
This does this through amplified monitoring. This means that you can determine whether the AI ​​answer is good or bad for achieving that goal. This is relatively simple, but with advanced capabilities, it can be challenging.
For example, even the GO experts didn’t realize how good the Move 37 is, a potential move that Alphago first played, which could be one in 10,000 people.
To address this challenge, the AI ​​system itself can be registered to provide feedback on answers such as discussions.
Once you know if the answer is good, you can use this to build a safe, aligned AI system. The challenge here is to understand the problems and instances that train your AI systems. Work on robust training, uncertainty estimation, and more allows AI systems to cover the various situations they encounter in real-world scenarios and create reliable AI.
Through effective monitoring and established computer security measures, it aims to mitigate the harm that may occur when AI systems pursue the goal of inconsistency.
Monitoring involves using AI systems called monitors to detect actions that don’t fit your goals. It is important that the monitor knows whether the action is safe or not. If you’re not sure, you should either reject the action or flag the action for further review.
Enable transparency
This all becomes easier when AI decisions become more transparent. We conduct extensive research on interpretability with the aim of increasing this transparency.
To further promote this, we have designed an easy-to-understand AI system.
For example, research on myopia optimization with non-salient approval (MONA) aims to ensure that long-term plans conducted by AI systems are understandable to humans. This is especially important as technology improves. Our study on Mona is the first to demonstrate the safety benefits of short-term optimization in LLMS.
Building an ecosystem for AGI preparation
The AGI Safety Council (ASC), led by Google Deepmind co-founder and AGI scientist Shane Legg, analyzes AGI risks and best practices and develops safety measures recommendations. ASC will work closely with the Responsibility and Safety Council, co-chaired by COO Lila Ibrahim and Senior Director of Responsibility, Helen King, to assess AGI research, projects and collaboration against AI principles and to assess advice and partnerships with research and product teams.
Our work on AGI safety complements the depth and breadth of responsibility and safety practices to address a wide range of issues, including harmful content, bias, and transparency. We also continue to utilize learning from agent safety, including the principle of placing humans in the loop and checking in to inform you of the approach to building AGI responsibly.
Externally, we work to promote collaboration with experts, industry, government, nonprofits and civil society organizations and adopt an informed approach to developing AGIs.
For example, it partners with non-profit AI safety research institutes, including Apollo and Redwood Research. He advises on dedicated inconsistency sections for the latest version of the Frontier Safety Framework.
Through our continued dialogue with global policy stakeholders, we hope to contribute to the international consensus on key frontier safety and security issues, including how we can most anticipate and prepare for new risks.
Our efforts include not only collaborating with others in the industry through organizations such as the Frontier Model Forum, but also valuable collaborations with AI Institute on safety testing to share and develop best practices. Ultimately, we believe that a coordinated international approach to governance is important to ensure that society benefits from sophisticated AI systems.
Educating AI researchers and AGI safety experts is fundamental to creating a strong foundation for their development. That’s why we have launched a new course on AGI safety for students, researchers and experts interested in this topic.
Ultimately, AGI’s approach to safety and security serves as an important roadmap to address many challenges that remain open. We look forward to working with the wider AI research community to promote AGI responsibly and to promote the enormous benefits of this technology for all.