Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

JBS Dev: About incomplete data and the last mile of AI – from model capabilities to cost sustainability

May 13, 2026

AI automates HR compliance except where tech companies need it

May 12, 2026

Pre-training a mix of experts to achieve new modularity

May 11, 2026
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Wednesday, May 13
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources
Versa AI hub
Home»Tools»GROQ hugging face reasoning provider
Tools

GROQ hugging face reasoning provider

versatileaiBy versatileaiJune 17, 2025No Comments4 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
#image_title
Share
Facebook Twitter LinkedIn Pinterest Email

We are delighted to share that Groq is a supported reasoning provider for Hug Face Hub! GROQ joins a growing ecosystem and directly enhances the breadth and capabilities of serverless inference on the hub’s model page. Inference providers are seamlessly integrated into the client SDK (both JS and Python), making it easy to use different models using preferred providers.

GROQ supports a variety of text and conversation models, including the latest open source models such as Meta’s Llama 4 and Qwen’s QWQ-32B.

At the heart of GROQ’s technology is the Language Processing Unit (LPUâ„¢). This is a new type of end-to-end processing unit system that provides the fastest inference for computationally intensive applications with sequential components such as large-scale language models (LLMS). The LPU is designed to overcome the limitations of GPUs for inference, providing significantly lower latency and higher throughput. This makes it ideal for real-time AI applications.

GROQ provides fast AI inference to openly available models. It provides APIs that allow developers to easily integrate these models into their applications. It offers an on-demand, pay-as-you-go model for accessing a wide range of openly available LLMS.

Now you can use GROQ’s inference API as your inference provider for Huggingface. I’m extremely excited to see what you build with this new provider.

Learn more about using GROQ as an inference provider on our dedicated documentation page.

See the list of supported models here.

How it works

In the website UI

In User Account Settings, you set your own API key for the provider you signed up for. If no custom key is configured, the request is routed through HF. Order a provider if you like. This applies to model page widgets and code snippets.

Inference provider

As mentioned before, when calling an inference provider there are two modes: a custom key (the call goes directly to the inference provider, using the corresponding inference provider’s own API key) (in that case no tokens are required from the provider.

Inference provider

The model page introduces third-party inference providers (compatible with current models sorted by user preferences)

Inference provider

From the client SDK

I’m using Huggingface_hub from Python

The following example shows how to use Meta’s Llama 4 using GROQ as the inference provider. Automatic routing through a hugging face can be used with a hugging face token or your own GROQ API key if you have one.

Install huggingface_hub from the source (see instructions). Official support will be released soon with version v0.33.0.

Import OS
from huggingface_hub Import Inference client=Inference client(provider=“groq”,api_key = os.environ(“HF_TOKEN”) ) message = ({
“role”: “user”,
“content”: “What is the capital of France?”
}) complete = client.chat.completions.create(model =“Metalama/llama-4-scout-17b-16e-instruct”message = message,)

printing(complete.choices)0). message)

From JS using @huggingface/Incerence

Import { inference } from “@Huggingface/Inference”;

const Client= new inference(process.Env.hf_token);

const chatcompletion = wait client.ChatCompletion({
Model: “Metalama/llama-4-scout-17b-16e-instruct”,
message:({
role: “user”,
content: “What is the capital of France?”,},),
Provider: “groq”,});

console.log(ChatCompletion.Choices(0).message);

Request

For direct requests, i.e. when using keys from inference providers, the corresponding provider will be billed. For example, if you are using a GROQ API key, your GROQ account will be billed.

For routed requests, i.e. when authenticating through a facehub that hugs, you only pay the standard provider API rate. There is no additional markup. Pass the provider’s costs directly. (In the future, we may establish a revenue sharing agreement with our provider partners.)

Important Memopia users get $2 worth of inference credits each month. You can use them between providers. 🔥

Subscribe to our Hugging Face Pro plan for access to inference credits, Zerogpu, Spaces Dev Mode, 20x high limits and more.

We also infer small allocations for sign-in free users for free, but upgrade to Pro if possible!

Feedback and next steps

We want to get your feedback! Share your thoughts and comments here: https://huggingface.co/spaces/huggingface/huggingdiscussions/discussions/49

author avatar
versatileai
See Full Bio
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleF5 expands NVIDIA LLM Routing and Security-enhanced AI Infrastructure
Next Article Piclumen Realistic V2 introduces advanced AI art generation. AI News Details
versatileai

Related Posts

Tools

JBS Dev: About incomplete data and the last mile of AI – from model capabilities to cost sustainability

May 13, 2026
Tools

AI automates HR compliance except where tech companies need it

May 12, 2026
Tools

Pre-training a mix of experts to achieve new modularity

May 11, 2026
Add A Comment

Comments are closed.

Top Posts

OpenAI blocks Sora from creating MLK video after Estate object

November 23, 200521 Views

SNS Network Project Increases GPUAAS Business and Server Sales, Expanding AI Adoption

May 6, 202518 Views

How Prezi leverages hubs and expert support programs to accelerate your ML roadmap

April 22, 202516 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

OpenAI blocks Sora from creating MLK video after Estate object

November 23, 200521 Views

SNS Network Project Increases GPUAAS Business and Server Sales, Expanding AI Adoption

May 6, 202518 Views

How Prezi leverages hubs and expert support programs to accelerate your ML roadmap

April 22, 202516 Views
Don't Miss

JBS Dev: About incomplete data and the last mile of AI – from model capabilities to cost sustainability

May 13, 2026

AI automates HR compliance except where tech companies need it

May 12, 2026

Pre-training a mix of experts to achieve new modularity

May 11, 2026
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2026 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?