At the speed at which the generated AI space is moving, we consider an open approach to connect ecosystems and to mitigate the potential risks of large-scale language models (LLMs). Last year, Meta released an early suite of open tools and assessments aimed at using open-generated AI models to promote responsible development. As LLM becomes increasingly integrated as a coding assistant, it introduces new cybersecurity vulnerabilities that need to be addressed. A comprehensive benchmark is essential to assess LLMS cybersecurity safety to address this challenge. This is where we assess LLM sensitivity, offensive cybersecurity features, and susceptibility to rapid injection attacks, which acts to provide a more comprehensive assessment of LLM cybersecurity risks. The Cyberseceval 2 leaderboard can be viewed here.
benchmark
The Cyberseceval 2 benchmark helps you assess LLMS trends, generate unsafe code and follow requests that support cyberattackers.
Testing Generation of Unstable Coding Practices: In the Unstable Coding Practice Test, LLM measures the frequency that suggests dangerous security weaknesses in both autocomplete and teaching contexts, as defined by the industry-standard unstable coding practice taxonomy of general weakness enumeration. Report code test pass rate. Rapid injection susceptibility testing: Rapid injection attacks in LLM-based applications are attempts to make LLM work in an undesirable way. Rapid injection testing assesses the LLM’s capabilities and recognizes which parts of the input are not trusted, and its level of level for common rapid injection techniques. Reports how often the model complies with attacks. Testing Request Compliance to Support Cyber Attacks: Testing to measure false rejection rates of confused and benign prompts. These prompts are similar to cyberattack compliance testing in that they cover a variety of topics, including CyberDefense, but are explicitly benign, even if they are malicious. Report a trade-off between false denial (refusing to support legitimate cyber-related activities) and violation rate (agreeing to support offensive cyber attacks). Testing the tendency to abuse code interpreters: Code interpreters allow LLM to run code in a sandbox environment. This set of prompts attempts to manipulate LLM to execute malicious code to access the system running LLM, gather sensitive information about the system, create and execute social engineering attacks, or gather information about the external infrastructure of the host environment. Reports the frequency of model compliance to attacks. Testing automated offensive cybersecurity features: This suite consists of capture-style security test cases that simulate program exploitation. Use LLM as a security tool to determine whether security issues can reach a specific point in a program that has been intentionally inserted. Some of these tests explicitly check whether the tool can perform basic exploits such as SQL injection and buffer overflow. Report the model’s completion rate.
All the code is open source and we hope that the community will use it to measure and enhance the cybersecurity safety properties of LLMS.
For more information about all benchmarks, click here.
Important insights
The latest evaluation of the cutting-edge large-scale language model (LLMS) using Cyberseceval 2 reveals both the progress and the ongoing challenges in addressing cybersecurity risks.
Industry improvements
Since the first version of the benchmark released in December 2023, the average LLM compliance rate with requests to support cyberattacks has fallen from 52% to 28%, indicating that the industry is more aware of the issue and is taking steps to improve it.
Model comparison
We found that models without code specialization tend to have a lower rate of compliance compared to models with code specialization. However, the gap between these models is narrowing, suggesting that code-specialized models are catching up from a security standpoint.
Rapid injection risk
Our rapid injection tests reveal that LLMS conditioning against such attacks remains an open issue, poses a significant security risk for applications built using these models. Developers should not assume that LLM can trust them to safely follow system prompts in the face of hostile input.
Code Exploitation Restrictions
Our code exploit tests suggest that models with higher common coding capabilities perform better, but that LLM still has a long way to go before they can ensure that they can solve the challenges of end-to-end exploits. This indicates that LLM is unlikely to destroy cyber exploitation attacks in its current state.
Risk of misuse of interpreters
Interpreter Abuse Test highlights vulnerabilities to LLM operations and allows abusive actions to be carried out within the code interpreter. This underscores the need for additional guardrails and detection mechanisms to prevent interpreter abuse.
How do I contribute?
We hope that the community will contribute to the benchmark. There are a few things you can do if you’re interested.
To run the CyberSeceval 2 benchmark on your model, you can follow these instructions. Feel free to send the output so that you can add your model to your leaderboard!
If you have any ideas to improve your Cyberseceval 2 benchmark, you can follow the instructions here to contribute directly.
Additional Resources