Responsibility and safety
Author published on October 6, 2025
Raluca Ada Popa and Four Flynn
Use advanced AI to fix critical software vulnerabilities
Today we share early results from our research on CodeMender. This is an agent equipped with new AI that automatically improves code security.
Software vulnerabilities are difficult and time-consuming for developers to find and fix even traditional automated methods such as fuzzing. AI-based initiatives such as Big Sleep and OSS-Fuzz demonstrate the ability of AI to find new zero-day vulnerabilities in well-tested software. As more breakthroughs are achieved with discovering AI-powered vulnerabilities, it becomes increasingly difficult for humans alone to catch up.
CodeMender helps to solve this issue by getting a comprehensive approach to code reactive security, instantly patching new vulnerabilities, actively rewrite existing code, and eliminating the entire class of vulnerabilities in the process. Over the past six months we’ve built CodeMender, we’ve already streamed 72 security fixes upstream into open source projects.
By automatically creating and applying high-quality security patches, Codemender’s AI-powered agents help developers and maintainers focus on building great software.
Working codemender
CodeMender operates by leveraging the thinking abilities of the recent Gemini Deep Think model to create autonomous agents that can debug and fix complex vulnerabilities.
To do this, the CodeMender agent is equipped with robust tools that allow you to infer about your code before making any changes, and automatically validate those changes to ensure that they are correct and do not cause regression.
An animation showing the CodeMender process for fixing vulnerabilities.
Large language models are improving rapidly, but code security mistakes can be expensive. CodeMender’s automatic verification process ensures that code changes are correct across many dimensions, for example, by surface only human reviews that fix the root cause of the problem.
As part of our research, we also developed new methods and tools that allow codemenders to reason about code and verify changes more effectively. This includes:
Advanced Program Analysis: We have developed tools based on advanced program analysis, including static analysis, dynamic analysis, discriminant testing, fuzzing, and SMT solvers. Using these tools to systematically scrutinize code patterns, control flows and data flows, CodeMender can better identify the root causes of security flaws and architectural weaknesses. Multi-Agent Systems: Developed a special purpose agent that allows CodeMender to tackle specific aspects of the underlying problem. For example, CodeMender uses a large-scale language model-based critique tool that highlights the differences between the original and revised code to ensure that the proposed changes do not introduce regressions and are self-corrected as needed.
Vulnerability fixes
To effectively patch vulnerabilities and prevent them from reappearing, Codemender uses debuggers, source code browsers, and other tools to identify the root cause and devise patches. Two examples of CodeMender patching vulnerabilities have been added to the video carousel below.
Example #1: Identify the root cause of a vulnerability
After analyzing the debugger output and the results of the code search tool, here is a snippet of agent inference regarding the root cause of the patch generated by the codemender.
Although the final patch in this example only changed a few lines of code, the root cause of the vulnerability was not immediately clear. In this case, the crash report showed heap buffer overflow, but the actual problem was elsewhere. Incorrect stack management of extensible markup language (XML) elements during parsing.
Example #2: Agent can create non-trivial patches
In this example, the CodeMender agent could come up with a non-trivial patch that deals with lifetime problems of complex objects.
Not only did the agent understand the root cause of the vulnerability, it was also able to modify the entire custom system to generate C code within the project.
Actively rewrite existing code for better security
We also designed CodeMender to actively rewrite existing code and use more secure data structures and APIs.
For example, I deployed CodeMender and applied the -founds -safety annotation to some of the widely used image compression library called libwebp. -founds -Safety annotation is applied, the compiler adds a limit check to the code to prevent attackers from exploiting buffer overflow or underflow to execute arbitrary code.
A few years ago, a heap buffer overflow vulnerability in LibWebp (CVE-2023-4863) was used by threat actors as part of a zero-click iOS exploit. -Fbounds -Safety Annotations makes this vulnerability unexplainable forever, along with most other buffer overflows in annotated projects.
The video carousel below provides an example of the agent’s decision-making process, including validation steps.
Example #1: Agent inference procedure
In this example, the CodeMender agent is asked to address the following -founds -safety error with the bit_depths pointer:
Example #2: Agent automatically fixes errors and tests faults
Another important feature of Codemender is that it can automatically fix new errors and test failures resulting from unique annotations. Here is an example of an agent recovering from a compilation error.
Example #3: Agent validates changes
In this example, the CodeMender agent changes the function and uses the LLM Judge tool configured for functional equivalent to ensure that the function remains intact. When the tool detects a failure, the agent will self-correct based on LLM judge feedback.
Make software safe for everyone
Although early results from CodeMender are promising, it takes a careful approach with a focus on reliability. Currently, all patches generated by CodeMender are being reviewed by human researchers before being submitted upstream.
I’ve already started submitting patches to various important open source libraries using CodeMender. Many of them have already been accepted and upstream. This process is gradually strengthened to ensure quality and systematically address feedback from the open source community.
It also uses patches generated by Codemender to gradually reach out to interested maintainers of important open source projects. By iterating feedback from this process, we want to release CodeMender as a tool that all software developers can use to keep their codebase safe.
There are many techniques and results to share. This will be published as a technical paper or report in the coming months. With CodeMender, we are just beginning to explore the incredible possibilities of AI to enhance software security for everyone.
Acknowledgments
Credits (listed alphabetically):
Alex Lebert, Ahman Hassanzadeh, Carlo Lemos, Charles Sutton, Donge Liu, Gogle Balakrishnan, Heap Chu, James Zahn, Kousic Sen, Lihao Lian, Max Shabrick, Oliver Chan, Petros Maniatis.