Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

GPT-5.5 is OpenAI’s most capable agent AI model to date

April 29, 2026

What is optical interconnect and why Lightelligence’s $10 billion debut claims it’s important for AI

April 28, 2026

Adaptive ultrasound imaging with physics-based NV-Raw2Insights-US AI

April 28, 2026
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Thursday, April 30
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources
Versa AI hub
Home»Tools»End-to-end example using Vectara hallucination leaderboard
Tools

End-to-end example using Vectara hallucination leaderboard

versatileaiBy versatileaiAugust 10, 2025No Comments5 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
#image_title
Share
Facebook Twitter LinkedIn Pinterest Email

Face’s open LLM leaderboard (originally created by Ed Beeching and Lewis Tunstall and maintained by Nathan Habib and Clémentine Forfried) is well known for tracking the performance of open source LLMS and comparing performance on a variety of tasks, such as Truthfulqa and Hellaswag.

This is extremely valuable to the open source community as it provides a way for practitioners to track the best open source models.

In late 2023, Vectara introduced the Hughes Hallucination Evaluation Model (HHEM), an open source model for measuring the extent to which LLMs are hallucinated (producing unconscious or dishonest text in the source content provided). Covering both open source models such as the Llama 2 and Mistral 7b, and commercial models such as Openai’s GPT-4, anthropological Claude and Google’s Gemini, the model highlighted the harsh differences that currently exist between the models in terms of hallucination potential.

As I continued to add new models to HHEM, I was looking for an open source solution to manage and update my HHEM leaderboard.

Most recently, the Hugging Face Leaderboard Team has released a Leaderboard template (here and here). These are lightweight versions of the open LLM leaderboard itself, which are open source and are easier to use than the original code.

Today we are pleased to announce the release of a new HHEM leaderboard with HF leaderboard templates.

Vectara’s Hughes Hallucination Evaluation Model (HHEM)

The Hughes Hallucination Evaluation Model (HHEM) leaderboard is dedicated to assessing the frequency of hallucinations in document summaries generated by large-scale language models (LLMs), such as GPT-4, Google Gemini, and Meta’s Llama 2.

By making an open source release of this model, Vectara aims to democratize the assessment of LLM hallucinations, and recognizes the differences that exist in terms of trends present in LLM performance.

The first release of HHEM was the Huggingface model alongside the GitHub repository, but I quickly realized that there was a need for a mechanism to allow new types of models to be evaluated. We recommend that the LLM community submit new leaderboards for dynamic updates using the HF leaderboard code template quickly organize new leaderboards that allow for dynamic updates, and that the LLM community submits new relevant models for HHEM evaluations.

In a meaningful side note to us here at Vectara, HHEM is named after Pier Simon Hughes, who passed away in November 2023 without informing us of the natural cause. We have decided to name his honor because of his lasting legacy in this field.

Set up HHEM with LLM leaderboard template

To set up the Vectara hhem leaderboard, I had to adjust the HF leaderboard template code to suit my needs and adjust it as follows:

After cloning the space repository into its own organization, I created two related data sets: “request” and “results”. These datasets maintain the requests submitted by users for the new LLM to evaluate, and the results of such evaluations, respectively. I’ve entered the results dataset with existing results from the first launch and updated the About and Citations sections.

For a simple leaderboard where the results of the evaluation are pushed by the backend to the results dataset, that’s all you need!

As the evaluation is more complicated, I customized the source code to suit the needs of the HHEM leaderboard. Details are as follows:

Leaderboard/SRC/BackEnd/Model_operations.py: This file contains two main classes: summarygenerator and evaluationModel. a. The summarygenerator generates an overview based on the HHEM private rating dataset and calculates metrics such as response rates and average summary lengths. b. The evaluation model loads our own Fuse Hallucination Evaluation Model (HHEM) to evaluate these summaries, generating metrics such as de facto consistency and hallucination rate. Leaderboard/src/backend/evaluate_model.py: Defines the evaluator class that calculates and returns results in JSON format using both summarygenerator and evaluationModel. Leaderboard/src/backend/run_eval_suite.py: Contains the function run_evaluation to take advantage of the evaluator to retrieve and upload evaluation results and upload them to the results data set above, and is displayed on the leaderboard. Leaderboard/main_backend.py: Manages pending evaluation requests and performs automatic evaluation using the above classes and features. It also includes an option for users to reproduce the evaluation results.

The final source code is available in the Files tab of the HHEM Leaderboard Repository. All these changes allow the evaluation pipeline to be ready and easily deployed as a hagging face space.

summary

HHEM is a new classification model that LLMS can use to assess the extent of hallucination. The use of a hugging face leaderboard template provided the support needed for the common needs of the leaderboard. Ability to manage submissions of new model evaluation requests and new resulting leaderboard updates.

A great praise to the embracing face team to make this valuable framework open source and support the Vectara team in its implementation. We expect this code to be reused by other community members who are aiming to expose other types of LLM leaderboards.

If you want to contribute to HHEM using the new model, please send it to the leaderboard. We appreciate the suggestion of a new model to evaluate.

Also, if you have any questions about the frontend of HugFace LLM or Vectara, feel free to contact us via the Vectara or Huggingface forum.

author avatar
versatileai
See Full Bio
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleMeduni Vienna helps to shape international AI regulations
Next Article Finely tune LLM to make it twice as fast, and Unsloth and 🤗TRL
versatileai

Related Posts

Tools

GPT-5.5 is OpenAI’s most capable agent AI model to date

April 29, 2026
Tools

What is optical interconnect and why Lightelligence’s $10 billion debut claims it’s important for AI

April 28, 2026
Tools

Adaptive ultrasound imaging with physics-based NV-Raw2Insights-US AI

April 28, 2026
Add A Comment

Comments are closed.

Top Posts

Disney invests $1 billion in OpenAI, licenses over 200 characters for Sora AI tool

December 12, 20255 Views

What is optical interconnect and why Lightelligence’s $10 billion debut claims it’s important for AI

April 28, 20264 Views

Trump’s “big beautiful bill” could ban AI regulations

May 27, 20254 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

Disney invests $1 billion in OpenAI, licenses over 200 characters for Sora AI tool

December 12, 20255 Views

What is optical interconnect and why Lightelligence’s $10 billion debut claims it’s important for AI

April 28, 20264 Views

Trump’s “big beautiful bill” could ban AI regulations

May 27, 20254 Views
Don't Miss

GPT-5.5 is OpenAI’s most capable agent AI model to date

April 29, 2026

What is optical interconnect and why Lightelligence’s $10 billion debut claims it’s important for AI

April 28, 2026

Adaptive ultrasound imaging with physics-based NV-Raw2Insights-US AI

April 28, 2026
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2026 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?