Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

Business Insider begins publishing stories with AI ‘authors’

October 24, 2025

Super charging OSS robotics learning

October 24, 2025

Autonomy in the real world? Druid AI releases AI agent “Factory”

October 24, 2025
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Friday, October 24
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources
Versa AI hub
Home»Research»PerfCodeGen from Salesforce AI Research: A training-free framework that improves the performance of LLM-generated code with execution feedback
Research

PerfCodeGen from Salesforce AI Research: A training-free framework that improves the performance of LLM-generated code with execution feedback

By January 18, 2025No Comments4 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
Share
Facebook Twitter LinkedIn Pinterest Email

Large-scale language models (LLMs) have become essential tools in software development, providing features such as code snippet generation, unit test automation, and debugging. However, these models are often insufficient to generate code that is not only functionally correct but also efficient at runtime. Overlooking runtime efficiency can reduce software performance, increase operational costs, and impact user experience. This problem is especially noticeable for inexperienced developers who may rely on the code suggested by the AI ​​without fully understanding its meaning. Salesforce Research addresses these challenges with PerfCodeGen, a framework aimed at improving both the accuracy and performance of LLM-generated code.

Salesforce AI’s PerfCodeGen is a no-training framework designed to run LLM-generated code more efficiently. This is achieved by using execution feedback in an iterative self-refinement process. Unlike approaches that require fine-tuning with extensive training data, PerfCodeGen employs a feedback loop that evaluates and refines your code based on runtime metrics during test execution. This framework operates in two important phases: accuracy tuning and performance optimization. First, ensure that the generated code meets your functional requirements by addressing issues identified by unit tests. Once correctness is established, the framework optimizes the code by targeting and refining the most resource-intensive test cases, focusing on runtime efficiency. This iterative process results in accurate and efficient solutions.

Technical insights and benefits

PerfCodeGen integrates with your existing LLM workflow and starts by generating multiple candidate solutions using nuclear sampling. In the first phase, these candidates are evaluated for accuracy through unit tests. Feedback from failed tests is used to improve the solution. Once the functionality is verified as correct, the framework moves on to the second phase, where it analyzes runtime metrics to identify bottlenecks. This information is used to further optimize the code, focusing on the most time-consuming test cases.

This two-step process increases the likelihood of creating an optimally efficient program. PerfCodeGen’s methodology mirrors human debugging and optimization practices and is effective and intuitive. Additionally, the framework relies on feedback rather than retraining, allowing it to scale across different LLMs and application domains. We show consistent improvements in runtime efficiency and accuracy across models such as Phi-3-mini, Llama 3, and GPT-4.

PerfCodeGen has been tested and proven effective on benchmarks such as HumanEval, MBPP, and APPS.

Runtime efficiency: On HumanEval, the optimization rate (%Opt) of GPT-4 increased from 24.54% to 28.83% with PERFCODEGEN, and similar improvements were observed for other models. Improved accuracy: MBPP increased GPT-3.5’s %Correct from 66.38% to 73.36% in a single sample (Best@1). Outperforming ground truth: With PERFCODEGEN, LLM was able to generate more efficient solutions than ground truth for approximately 55% of HumanEval tasks and 67% of MBPP tasks. Scalability: Open models such as Phi-3-mini and Mixtral achieved performance comparable to closed models such as GPT-3.5 and GPT-4.

These results highlight that PERFODEGEN can effectively balance accuracy and runtime efficiency and is a valuable addition to LLM-driven code generation workflows.

Conclusion:

PerfCodeGen provides a practical solution to a major limitation of current LLMs: their emphasis on accuracy at the expense of runtime efficiency. PerfCodeGen enables accurate and efficient code generation by incorporating execution feedback into the iterative improvement process. This approach improves the ease of use of LLM in software development and provides developers with the tools to produce high-quality code without extensive retraining. The framework’s success in various benchmarks shows its potential as a step forward in creating efficient, reliable, and accessible AI-driven programming solutions.

Check out our Paper and GitHub pages. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram channel and LinkedIn group. Don’t forget to join the 65,000+ ML SubReddit.

🚨 Recommended open source platform: Parlant is a framework that transforms the way AI agents make decisions in customer-facing scenarios. (promotion)

Asif Razzaq is the CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of artificial intelligence for social good. His latest endeavor is the launch of Marktechpost, an artificial intelligence media platform. It stands out for its thorough coverage of machine learning and deep learning news, which is technically sound and easily understood by a wide audience. The platform boasts over 2 million views per month, which shows its popularity among viewers.

📄 Introducing Height: The Only Autonomous Project Management Tool (Sponsored)

author avatar
See Full Bio
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleDeveloping trusted AI tools for healthcare
Next Article Creating video content using artificial intelligence

Related Posts

Research

New AI research clarifies the origins of Papua New Guineans

July 22, 2025
Research

AI helps prevent medical errors in real clinics

July 22, 2025
Research

No one is surprised, and a new study says that AI overview causes a significant drop in search clicks

July 22, 2025
Add A Comment

Comments are closed.

Top Posts

Paris AI Safety Breakfast #3: Yoshua Bengio

February 13, 20256 Views

WhatsApp blocks AI chatbots to protect business platform

October 19, 20254 Views

Investigate top AI security threats

October 23, 20253 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

Paris AI Safety Breakfast #3: Yoshua Bengio

February 13, 20256 Views

WhatsApp blocks AI chatbots to protect business platform

October 19, 20254 Views

Investigate top AI security threats

October 23, 20253 Views
Don't Miss

Business Insider begins publishing stories with AI ‘authors’

October 24, 2025

Super charging OSS robotics learning

October 24, 2025

Autonomy in the real world? Druid AI releases AI agent “Factory”

October 24, 2025
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2025 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?