Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

A collaborative effort to maintain application resilience

June 17, 2025

Samsung R&D Institute, IIT Madras signs MOU to promote research in AI such as Indian language, HealthTech | Education

June 17, 2025

Pentagon Awards Openai $200 million AI contract for national security

June 17, 2025
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Tuesday, June 17
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
Versa AI hub
Home»Research»Salesforce Research lays the foundation for a more reliable enterprise AI agent
Research

Salesforce Research lays the foundation for a more reliable enterprise AI agent

versatileaiBy versatileaiMay 1, 2025No Comments4 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
#image_title
Share
Facebook Twitter LinkedIn Pinterest Email

Getty Images / Picture Alliance / Contributor

The value of systems capable of performing human tasks, particularly AI agents, which have opportunities for productivity improvements, are clear. However, performance of large-scale language models (LLMS) can hinder effective agent deployment. Salesforce’s AI research is trying to address that issue.

Also: 60% of AI agents work in IT departments – this is what they do every day

On Thursday, Salesforce launched its first Salesforce AI Research in a review report, highlighting innovations in high-tech companies, including new basic developments and research papers from the past quarter. Salesforce hopes that these works will help support the development of trusted and capable AI agents that may work well in a business environment.

“At Salesforce, we call these “boring breakthroughs.” Not because they are inconspicuous, but because they are quietly capable, certainly scalable, and built to withstand,” says Salvio Savarese. “They’re very seamless and some people may take them for granted.”

Also: 4 types of people interested in AI agents – and what businesses can learn from them

Dive into some of the biggest breakthroughs and takeouts from the report.

Problem: Intelligence with Jug

If you’ve ever used AI models for simple tasks every day, you might be surprised by the rudimentary nature of their mistakes. What’s even more inexplicable is that the same model that got the basic question wrong worked very well beyond benchmarks that tested its functionality on very complex topics like mathematics, STEM, coding, and more. This paradox is what Salesforce calls “Gagged Intelligence.”

Salesforce points out that this “anxiety,” or the contradiction between LLM’s raw intelligence and consistent real-world performance, is particularly challenging for businesses that require consistent operational performance, especially in unpredictable environments. However, addressing the problem means quantifying it first, which highlights another problem.

“AI today is jagged, so we need to work on that, but can we work on something without measuring first?” says Shelby Heinecke, AI Senior Research Manager at Salesforce.

Also: Why is it such a dangerous business to ignore AI ethics and how to do AI right?

That’s exactly the problem that Salesforce’s new simple benchmark addresses.

benchmark

Salesforce’s simple public datasets are easy for humans to answer, but AI is easy for them to benchmark or quantify because of LLM’s Jaganness. To give you an idea of ​​how basic the questions are, Face’s dataset card describes the problem as “at least 10% of high school students who are given pens, unlimited paper and an hour of time can be solved.”

Although there is no testing for supercomplex tasks, simple benchmarks should help individuals understand how models can infer in their real environments and applications, especially when developing enterprise general intelligence (EGI). These competent AI systems ensure that you handle business applications.

Also: 60% of AI agents work in IT departments – this is what they do every day

Another advantage of benchmarks is that they have a better idea of ​​model performance consistency, which should lead to higher trust from business leaders about implementing AI systems, such as AI agents, into their business.

Another benchmark developed by Salesforce is ContextJudgeBench. This evaluates AI-enabled judges rather than the model itself and takes a different approach. AI model benchmarks often use evaluations from other AI models. ContextualJudgeBench focuses on LLMS that evaluates other models with the idea that if the rater is reliable, that evaluation will be. The benchmark test exceeds 2,000 response pairs.

crmarena

In the past quarter, Salesforce launched Crmarena, an agent benchmark framework. This framework evaluates how AI agents perform CRM (Customer Relationship Management) tasks. This will make Commerce recommendations, such as how AI can summarise sales emails and transcripts.

“These agents don’t need to solve theorems, they don’t need to turn my prose into Shakespeare’s poems.

Also: How “Agent Internet” helps to work with AIS

Crmarena aims to address issues with organizations that don’t know how well a model works in real business tasks. Beyond comprehensive testing, this framework should help improve AI agents development and performance.

Other notable mentions

The complete report includes further research to help improve the efficiency and reliability of AI models. Here is a very simple summary of some of these highlights:

SFR-embedding: Salesforce has enhanced its SFR embedding model that converts text-based information into structured data from AI systems such as agents. The company has also added SFR-Embedded Code, a specialized code embedding family of models. SFR-Guard: A family of models trained with data to assess the performance of AI agents in key business areas, including toxicity detection and rapid injection. XLAM: Salesforce has updated its Xlam (Large Action Model) family with “multi-turn conversation support and wider smaller models for improved accessibility.” TACO: The multimodal family of this model generates a chain of thoughts and actions (COTA) to tackle complex multi-step problems.

author avatar
versatileai
See Full Bio
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleThe UK is investing £7 million in AI-driven semiconductor research
Next Article The battle over AI-related content creation rights – Opinion
versatileai

Related Posts

Research

Samsung R&D Institute, IIT Madras signs MOU to promote research in AI such as Indian language, HealthTech | Education

June 17, 2025
Research

Startups raise $17 million Series A to help businesses

June 16, 2025
Research

Minister reveals the first thing of this kind of AI for a master’s degree

June 13, 2025
Add A Comment
Leave A Reply Cancel Reply

Top Posts

Piclumen Art V1: Next Generation AI Image Generation Model Launches for Digital Creators | Flash News Details

June 5, 20253 Views

Presight plans to expand its AI business internationally

April 14, 20252 Views

PlanetScale Vectors GA: MySQL and AI Database Game Changer

April 14, 20252 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

Piclumen Art V1: Next Generation AI Image Generation Model Launches for Digital Creators | Flash News Details

June 5, 20253 Views

Presight plans to expand its AI business internationally

April 14, 20252 Views

PlanetScale Vectors GA: MySQL and AI Database Game Changer

April 14, 20252 Views
Don't Miss

A collaborative effort to maintain application resilience

June 17, 2025

Samsung R&D Institute, IIT Madras signs MOU to promote research in AI such as Indian language, HealthTech | Education

June 17, 2025

Pentagon Awards Openai $200 million AI contract for national security

June 17, 2025
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2025 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?