Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

Dubai’s AI government efficiency strategy revealed

November 9, 2025

“Pluribus” includes a disclaimer about Apple TV: “Made by Humans”

November 9, 2025

California eases AI laws to retain tech leadership

November 9, 2025
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Sunday, November 9
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources
Versa AI hub
Home»Tools»Accelerating LLM inference with TGI in Intel Gaudi
Tools

Accelerating LLM inference with TGI in Intel Gaudi

versatileaiBy versatileaiMarch 28, 2025No Comments3 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
#image_title
Share
Facebook Twitter LinkedIn Pinterest Email

We are pleased to announce that native integration of Intel Gaudi hardware support will be integrated directly into Text Generation Inference (TGI), a production-ready delivery solution for large-scale language models (LLM). This integration brings the power of Intel’s professional AI accelerators to the high-performance inference stack, increasing the deployment options for the open source AI community.

✨What’s new?

Fully integrated Gaudi support into the main codebase of TGI in PR#3091. Previously, I kept another fork for Tgi-Gaudi’s Gaudi device. This was a hassle for users and was unable to support the latest TGI features at launch. Currently, we are using the new TGI multi-back elsite technique to support Gaudi directly with TGI.

This integration supports a full line of Intel’s Gaudi hardware.

You can also find more details about Gaudi hardware on Intel’s Gaudi product page

🌟Why is this important?

TGI’s Gaudi backend offers several important benefits.

Hardware versatility 🔄: More options for deploying LLM to production beyond traditional GPUs 💰: Gaudi hardware often offers attractive price performance for specific workloads ⚙️: All robustness of TGI (dynamic batch, streaming response, etc.) features 🔥: Support for multi-card inference (shards), vision language models, and FP8 accuracy

Start TGI with gaudi

The easiest way to run TGI in Gaudi is to use the official Docker image. You need to run the image on a Gaudi hardware machine. Here is a basic example to get you started:

Model=Metalama/Metalama-3.1-8b-instruct volume =$ pwd/data hf_token = your_hf_access_token docker run – runtime = habana – cap -add = sys_nice -ipc = host \ -p 8080:80 \ -v $Volume:/data \ -e hf_token =$ hf_token \ -e habana_visible_devices = all \ ghcr.io/huggingface/text-generation-inference:3.2.1-gaudi \ – model-id $Model

Once the server is running, you can send an inference request.

Curl 127.0.0.1:8080/Generate -X Post -D ‘{“inputs”: “What is deep learning?”, “Parameters”: {“max_new_tokens”: 32}}’
-H ‘Content-Type: Application/JSON’

For comprehensive documentation on using TGI with Gaudi, including how-to guides and advanced configuration, see the new dedicated Gaudi backend documentation.

🎉Top Function

We optimized the following models for both single-card and multi-card configurations: This means that these models run as fast as possible with Intel Gaudi. Particularly optimize your modeling code targeting Intel Gaudi hardware to provide the best performance and take full advantage of Gaudi’s capabilities.

llama 3.1 (8b and 70b) llama 3.3 (70b) llama 3.2 Vision (11b) Mistral (7b) Mixtral (8x7b) Codellama (13b) Falcon (180b) Qwen2 (72b) Starcoder and Starcoder2 Gemma (7b) Llava-v1.6-mirtral-7b phi-2 phi-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-

It also offers many advanced features on Gaudi hardware, such as FP8 quantization thanks to Intel Neural Compressor (INC), allowing for even greater performance optimization.

✨ is coming soon! We look forward to expanding our model lineup with cutting edge additions, including DeepSeek-R1/V3, QWen-VL, and more powerful models, to enhance our AI applications. 🚀

Intertecting

Invite the community to try out TGI on Gaudi hardware and provide feedback. The complete documentation is available in the TGI Gaudi Backend documentation. If you’re interested in contributing, check out the contribution guidelines and open the issue with GitHub feedback. By providing Intel Intel Gaudi support directly to TGI, we continue our mission to provide flexible, efficient, production-ready tools for deploying LLMS. I’m looking forward to what I’ll build with this new feature! 🎉

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleAI Content Manager: Threadsight AI
Next Article State strengthens AI laws amid federal rollbacks
versatileai

Related Posts

Tools

Dubai’s AI government efficiency strategy revealed

November 9, 2025
Tools

Market share loss from 95% to zero

November 8, 2025
Tools

Microsoft’s next big bet on AI: Building a humanist superintelligence

November 7, 2025
Add A Comment

Comments are closed.

Top Posts

Latamdate addresses the rising risk of AI with online romance and strengthens its commitment to security

March 12, 20255 Views

AI helps researchers discover new structural materials

February 28, 20255 Views

OpenAI spreads $600 billion bet on cloud AI across AWS, Oracle, and Microsoft

November 3, 20254 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

Latamdate addresses the rising risk of AI with online romance and strengthens its commitment to security

March 12, 20255 Views

AI helps researchers discover new structural materials

February 28, 20255 Views

OpenAI spreads $600 billion bet on cloud AI across AWS, Oracle, and Microsoft

November 3, 20254 Views
Don't Miss

Dubai’s AI government efficiency strategy revealed

November 9, 2025

“Pluribus” includes a disclaimer about Apple TV: “Made by Humans”

November 9, 2025

California eases AI laws to retain tech leadership

November 9, 2025
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2025 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?