Large-scale language models (LLMs) have become essential tools in software development, providing features such as code snippet generation, unit test automation, and debugging. However, these models are often insufficient to generate code that is not only functionally correct but also efficient at runtime. Overlooking runtime efficiency can reduce software performance, increase operational costs, and impact user experience. This problem is especially noticeable for inexperienced developers who may rely on the code suggested by the AI ​​without fully understanding its meaning. Salesforce Research addresses these challenges with PerfCodeGen, a framework aimed at improving both the accuracy and performance of LLM-generated code.
Salesforce AI’s PerfCodeGen is a no-training framework designed to run LLM-generated code more efficiently. This is achieved by using execution feedback in an iterative self-refinement process. Unlike approaches that require fine-tuning with extensive training data, PerfCodeGen employs a feedback loop that evaluates and refines your code based on runtime metrics during test execution. This framework operates in two important phases: accuracy tuning and performance optimization. First, ensure that the generated code meets your functional requirements by addressing issues identified by unit tests. Once correctness is established, the framework optimizes the code by targeting and refining the most resource-intensive test cases, focusing on runtime efficiency. This iterative process results in accurate and efficient solutions.
Technical insights and benefits
PerfCodeGen integrates with your existing LLM workflow and starts by generating multiple candidate solutions using nuclear sampling. In the first phase, these candidates are evaluated for accuracy through unit tests. Feedback from failed tests is used to improve the solution. Once the functionality is verified as correct, the framework moves on to the second phase, where it analyzes runtime metrics to identify bottlenecks. This information is used to further optimize the code, focusing on the most time-consuming test cases.
This two-step process increases the likelihood of creating an optimally efficient program. PerfCodeGen’s methodology mirrors human debugging and optimization practices and is effective and intuitive. Additionally, the framework relies on feedback rather than retraining, allowing it to scale across different LLMs and application domains. We show consistent improvements in runtime efficiency and accuracy across models such as Phi-3-mini, Llama 3, and GPT-4.
PerfCodeGen has been tested and proven effective on benchmarks such as HumanEval, MBPP, and APPS.
Runtime efficiency: On HumanEval, the optimization rate (%Opt) of GPT-4 increased from 24.54% to 28.83% with PERFCODEGEN, and similar improvements were observed for other models. Improved accuracy: MBPP increased GPT-3.5’s %Correct from 66.38% to 73.36% in a single sample (Best@1). Outperforming ground truth: With PERFCODEGEN, LLM was able to generate more efficient solutions than ground truth for approximately 55% of HumanEval tasks and 67% of MBPP tasks. Scalability: Open models such as Phi-3-mini and Mixtral achieved performance comparable to closed models such as GPT-3.5 and GPT-4.
These results highlight that PERFODEGEN can effectively balance accuracy and runtime efficiency and is a valuable addition to LLM-driven code generation workflows.

Conclusion:
PerfCodeGen provides a practical solution to a major limitation of current LLMs: their emphasis on accuracy at the expense of runtime efficiency. PerfCodeGen enables accurate and efficient code generation by incorporating execution feedback into the iterative improvement process. This approach improves the ease of use of LLM in software development and provides developers with the tools to produce high-quality code without extensive retraining. The framework’s success in various benchmarks shows its potential as a step forward in creating efficient, reliable, and accessible AI-driven programming solutions.
Check out our Paper and GitHub pages. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram channel and LinkedIn group. Don’t forget to join the 65,000+ ML SubReddit.
🚨 Recommended open source platform: Parlant is a framework that transforms the way AI agents make decisions in customer-facing scenarios. (promotion)

Asif Razzaq is the CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of artificial intelligence for social good. His latest endeavor is the launch of Marktechpost, an artificial intelligence media platform. It stands out for its thorough coverage of machine learning and deep learning news, which is technically sound and easily understood by a wide audience. The platform boasts over 2 million views per month, which shows its popularity among viewers.
📄 Introducing Height: The Only Autonomous Project Management Tool (Sponsored)