Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

Utah has enacted AI fixes targeting mental health chatbots and generation AI | Sheppard Mullin Richter & Hampton LLP

May 19, 2025

The growing issues regarding social media AI

May 19, 2025

Introducing the Hebrew LLMS open leaderboard!

May 19, 2025
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Monday, May 19
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
Versa AI hub
Home»Tools»Accelerating LLM inference with TGI in Intel Gaudi
Tools

Accelerating LLM inference with TGI in Intel Gaudi

versatileaiBy versatileaiMarch 28, 2025No Comments3 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
#image_title
Share
Facebook Twitter LinkedIn Pinterest Email

We are pleased to announce that native integration of Intel Gaudi hardware support will be integrated directly into Text Generation Inference (TGI), a production-ready delivery solution for large-scale language models (LLM). This integration brings the power of Intel’s professional AI accelerators to the high-performance inference stack, increasing the deployment options for the open source AI community.

✨What’s new?

Fully integrated Gaudi support into the main codebase of TGI in PR#3091. Previously, I kept another fork for Tgi-Gaudi’s Gaudi device. This was a hassle for users and was unable to support the latest TGI features at launch. Currently, we are using the new TGI multi-back elsite technique to support Gaudi directly with TGI.

This integration supports a full line of Intel’s Gaudi hardware.

You can also find more details about Gaudi hardware on Intel’s Gaudi product page

🌟Why is this important?

TGI’s Gaudi backend offers several important benefits.

Hardware versatility πŸ”„: More options for deploying LLM to production beyond traditional GPUs πŸ’°: Gaudi hardware often offers attractive price performance for specific workloads βš™οΈ: All robustness of TGI (dynamic batch, streaming response, etc.) features πŸ”₯: Support for multi-card inference (shards), vision language models, and FP8 accuracy

Start TGI with gaudi

The easiest way to run TGI in Gaudi is to use the official Docker image. You need to run the image on a Gaudi hardware machine. Here is a basic example to get you started:

Model=Metalama/Metalama-3.1-8b-instruct volume =$ pwd/data hf_token = your_hf_access_token docker run – runtime = habana – cap -add = sys_nice -ipc = host \ -p 8080:80 \ -v $Volume:/data \ -e hf_token =$ hf_token \ -e habana_visible_devices = all \ ghcr.io/huggingface/text-generation-inference:3.2.1-gaudi \ – model-id $Model

Once the server is running, you can send an inference request.

Curl 127.0.0.1:8080/Generate -X Post -D ‘{“inputs”: “What is deep learning?”, “Parameters”: {“max_new_tokens”: 32}}’
-H ‘Content-Type: Application/JSON’

For comprehensive documentation on using TGI with Gaudi, including how-to guides and advanced configuration, see the new dedicated Gaudi backend documentation.

πŸŽ‰Top Function

We optimized the following models for both single-card and multi-card configurations: This means that these models run as fast as possible with Intel Gaudi. Particularly optimize your modeling code targeting Intel Gaudi hardware to provide the best performance and take full advantage of Gaudi’s capabilities.

llama 3.1 (8b and 70b) llama 3.3 (70b) llama 3.2 Vision (11b) Mistral (7b) Mixtral (8x7b) Codellama (13b) Falcon (180b) Qwen2 (72b) Starcoder and Starcoder2 Gemma (7b) Llava-v1.6-mirtral-7b phi-2 phi-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-

It also offers many advanced features on Gaudi hardware, such as FP8 quantization thanks to Intel Neural Compressor (INC), allowing for even greater performance optimization.

✨ is coming soon! We look forward to expanding our model lineup with cutting edge additions, including DeepSeek-R1/V3, QWen-VL, and more powerful models, to enhance our AI applications. πŸš€

Intertecting

Invite the community to try out TGI on Gaudi hardware and provide feedback. The complete documentation is available in the TGI Gaudi Backend documentation. If you’re interested in contributing, check out the contribution guidelines and open the issue with GitHub feedback. By providing Intel Intel Gaudi support directly to TGI, we continue our mission to provide flexible, efficient, production-ready tools for deploying LLMS. I’m looking forward to what I’ll build with this new feature! πŸŽ‰

author avatar
versatileai
See Full Bio
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleAI Content Manager: Threadsight AI
Next Article State strengthens AI laws amid federal rollbacks
versatileai

Related Posts

Tools

Introducing the Hebrew LLMS open leaderboard!

May 19, 2025
Tools

Subscribe to Enterprise Hub with your AWS account

May 19, 2025
Tools

Building cost-effective enterprise RAG applications using Intel Gaudi 2 and Intel Xeon

May 18, 2025
Add A Comment
Leave A Reply Cancel Reply

Top Posts

The UAE announces bold AI-led plans to revolutionize the law

April 22, 20253 Views

The UAE will use artificial intelligence to develop new laws

April 22, 20253 Views

New report on national security risks from weakened AI safety frameworks

April 22, 20253 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

The UAE announces bold AI-led plans to revolutionize the law

April 22, 20253 Views

The UAE will use artificial intelligence to develop new laws

April 22, 20253 Views

New report on national security risks from weakened AI safety frameworks

April 22, 20253 Views
Don't Miss

Utah has enacted AI fixes targeting mental health chatbots and generation AI | Sheppard Mullin Richter & Hampton LLP

May 19, 2025

The growing issues regarding social media AI

May 19, 2025

Introducing the Hebrew LLMS open leaderboard!

May 19, 2025
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2025 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?