Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

Creating innovative content at your fingertips

July 4, 2025

The UK and Singapore form an alliance to guide AI into finance

July 4, 2025

StarCoder2 and Stack V2

July 4, 2025
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Friday, July 4
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
Versa AI hub
Home»Tools»New AI risk early warning system
Tools

New AI risk early warning system

By January 27, 2025No Comments5 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
Share
Facebook Twitter LinkedIn Pinterest Email

responsibility and safety

Author Published on May 25, 2023

toby chevrane

New study proposes framework for evaluating generic models against new threats

To be a responsible pioneer at the cutting edge of artificial intelligence (AI) research, we need to identify new capabilities and new risks for AI systems as quickly as possible.

AI researchers already use a variety of evaluation benchmarks to identify unwanted behaviors in AI systems, such as repeating misleading statements, biased decisions, and copyrighted content. Now, as the AI ​​community builds and deploys increasingly powerful AI, we are expanding our evaluation portfolio to move from general purpose AI models to extremes with strong skills in manipulation, deception, cyber breaches, or other dangerous capabilities. should include potential risks.

In our latest paper, co-authored with colleagues at the Center for Long Term, Open, Human Research, University of Cambridge, Oxford University, University of Toronto, University of Montreal, we introduce a framework for assessing these new threats. . Resilience, and AI Governance Center.

Safety assessments of models, including those that assess extreme risks, will be a key element of safe AI development and deployment.

Summary of the proposed approach: To assess extreme risks from new general-purpose AI systems, developers must assess their dangerous capabilities and integrity (see below). By identifying risks early, this unlocks opportunities to take accountability when training new AI systems, deploying these AI systems, transparently accounting for risks, and applying appropriate cybersecurity standards.

Assessing extreme risks

Generic models typically learn abilities and behaviors during training. However, existing methods for steering the learning process are incomplete. For example, previous research at Google Deepmind investigated how AI systems learn to pursue undesirable goals even when they correctly reward appropriate behavior.

Responsible AI developers must expect and anticipate future developments and the possibility of new risks. After continued progression, future generic models may learn various dangerous features by default. For example, future AI systems will carry out offensive cyber operations, skillfully deceive humans in dialogue, manipulate humans to carry out harmful actions, use weapons (biological, chemical, etc.), and fine-tune human manipulation. It is plausible (though uncertain) that cloud computing platforms could coordinate and operate other high-risk AI systems, or assist humans with any of these tasks.

People with malicious intent who access such models may misuse the capabilities. Alternatively, alignment failures may cause these AI models to take harmful actions even if they did not intend this to happen.

Model evaluation helps identify these risks in advance. Under our framework, AI developers use model evaluation to uncover.

To what extent do models have specific “dangerous features” that can be used to threaten security, exert influence, or evade surveillance? Models tend to apply their ability to cause harm (i.e. model alignment). Alignment evaluation should ensure that the model behaves as intended even in a very wide range of scenarios, and should examine the inner workings of the model where possible.

The results of these assessments help AI developers understand whether sufficient ingredients exist for extreme risk. The highest risk cases combine multiple dangerous features. As shown in this diagram, the AI ​​system does not need to provide all ingredients.

Ingredients for extreme risk: Certain functions may be outsourced either to humans (e.g. users or crowds) or to other AI systems. These features must be applied to harm, either due to alignment misuse or failure (or a mixture of both).

Rule of thumb: The AI ​​community should treat an AI system as highly dangerous if it has a capability profile sufficient to cause extreme harm, assuming that it is misused or poorly aligned. Deploying such systems into the real world requires AI developers to demonstrate unusually high standards of safety.

Model evaluation as a critical governance infrastructure

Companies and regulators can be better assured if they have better tools to identify which models are unsafe.

Responsible training: Responsible decisions are made about whether and how to train new models that show early signs of risk. Responsive Deployment: Responsible decisions are made about whether, when, and how to deploy potentially risky models. Reported to stakeholders to assist in preparing for or mitigating potential risks. Appropriate security: Strong information security controls and systems are applied to models that may pose extreme risks.

We have developed a blueprint for how model assessments of extreme risk should feed into important decisions about training and deploying highly capable, general-purpose models. Developers can conduct assessments throughout, grant structured model access to external safety researchers and model auditors, and conduct additional assessments where assessment results can inform model training and risk assessment prior to deployment. I will do so.

A blueprint for embedded model evaluation to expose critical decision-making processes to extreme risk through model training and deployment.

looking ahead

Significant early work on model assessment of extreme risks is already underway, including at Google Deepmind. However, much more progress is needed, both technologically and institutionally, to develop assessment processes that catch all possible risks and help protect against new challenges in the future.

Model evaluation is not a panacea. For example, some risks can slip through the net. This is because it relies heavily on factors external to the model, such as society’s complex social, political, and economic forces. Model evaluations must be combined with other risk assessment tools and combine a broader dedication to safety across industry, government and civil society.

Google’s recent blog on responsible AI states that “individual practices, shared industry standards, and sound government policies are essential to getting AI right.” We hope that many others working in AI and sectors affected by this technology will create approaches and standards to safely develop and deploy AI for the benefit of all.

Having a process to track the emergence of dangerous properties in a model, and to respond appropriately regarding the results, is critical to being a responsible developer operating at the frontier of AI capabilities. I think this is a very important part.

author avatar
See Full Bio
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticlePortal exclusive: Balancing the use of social media, AI tools essential to prevent brain rot
Next Article Deepseek restricts sign up in “large -scale malicious attacks”

Related Posts

Tools

The UK and Singapore form an alliance to guide AI into finance

July 4, 2025
Tools

StarCoder2 and Stack V2

July 4, 2025
Tools

Intel®Gaudi®2AI Accelerator Text Generation Pipeline

July 3, 2025
Add A Comment
Leave A Reply Cancel Reply

Top Posts

New Star: Discover why 보니 is the future of AI art

February 26, 20252 Views

Impact International | EU AI ACT Enforcement: Business Transparency and Human Rights Impact in 2025

June 2, 20251 Views

Presight plans to expand its AI business internationally

April 14, 20251 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

New Star: Discover why 보니 is the future of AI art

February 26, 20252 Views

Impact International | EU AI ACT Enforcement: Business Transparency and Human Rights Impact in 2025

June 2, 20251 Views

Presight plans to expand its AI business internationally

April 14, 20251 Views
Don't Miss

Creating innovative content at your fingertips

July 4, 2025

The UK and Singapore form an alliance to guide AI into finance

July 4, 2025

StarCoder2 and Stack V2

July 4, 2025
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2025 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?