Hub fireworks

Following the recent announcement on Hub’s inference provider, we are excited to share the fireworks.

fireworks.ai directly provides fiery serverless inference across the HF ecosystem of model pages and libraries and tools, making inferences to your favorite models easier than ever to execute.

Of course, from now on, serverless inference can be performed on the following models via fireworks.

More lists can be found here for the complete list.

Light up your project with today’s fireworks.

How it works

In the website UI

Find all models supported for fireworks in HF here.

From the client SDK

I’m using Huggingface_hub from Python

The following example shows how to use Fireworks.ai as an inference provider: Automatic routing through a hugging face can be used with a hugging face token, or if you have your own fireworks, you can use your own.

Install huggingface_hub from source:

PIP Install git+https://github.com/huggingface/huggingface_hub

Use the Huggingface_Hub Python library to define provider parameters and call the Fireworks.ai endpoint.

from huggingface_hub Import Inference client=Inference client(provider=“firework”,api_key =「xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
) Message = ({
“role”: “user”,
“content”: “What is the capital of France?”
}) complete = client.chat.completions.create(model =“deepseek-ai/deepseek-r1”message = message, max_tokens =500
))

printing(complete.choices)0). message)

From JS using @huggingface/Incerence

Import { hfinference } from “@Huggingface/Inference”;

const Client= new hfinference(「xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx);

const chatcompletion = wait client.ChatCompletion({
Model: “deepseek-ai/deepseek-r1”,
message:({
role: “user”,
content: “How do you make a very spicy mayonnaise?”
}),
Provider: “firework”,
max_tokens: 500
});

console.log(ChatCompletion.Choices(0).message);

From an HTTP call

Here’s how to call Llama-3.3-70B-Instruct as an inference provider via Curl using Fireworks.ai:

curl ‘https://router.huggingface.co/fireworks-ai/v1/chat/completions’ \ -H ‘Authorization: Bearer xxxxxxxxx’ \ -H ‘Content-Type: application/json’ \ –data ‘{ ” Model “:” accounts/fireworks/models/llama-v3-70b-instruct “, “messages”: ({” role “:” user “, ” content “:” what does life mean if you were a dog? Is it? “}),” max_tokens “:500, “stream”:false}’

Request

If you make a direct request, that is, if you use a fireworks key, you will be billed directly to your fireworks account.

For routed requests, i.e. when authenticating through the hub, you only pay the standard fireworks API rate. There is no additional markup. Pass the provider’s costs directly. (In the future, we may establish a revenue sharing agreement with our provider partners.)

Important Memopia users get $2 worth of inference credits each month. You can use them between providers. 🔥

Subscribe to our Hugging Face Pro plan for access to inference credits, Zerogpu, Spaces Dev Mode, 20x high limits and more.

See Full Bio

What's Hot

StarCoder2 and Stack V2

Intel®Gaudi®2AI Accelerator Text Generation Pipeline

CAC has announced AI-powered business registration portal – thisdaylive

StarCoder2 and Stack V2

Intel®Gaudi®2AI Accelerator Text Generation Pipeline

Research shows that AI can reduce global carbon emissions

Impact International | EU AI ACT Enforcement: Business Transparency and Human Rights Impact in 2025

Presight plans to expand its AI business internationally

PlanetScale Vectors GA: MySQL and AI Database Game Changer

Most Popular

Impact International | EU AI ACT Enforcement: Business Transparency and Human Rights Impact in 2025

Presight plans to expand its AI business internationally

PlanetScale Vectors GA: MySQL and AI Database Game Changer

Don't Miss

StarCoder2 and Stack V2

Intel®Gaudi®2AI Accelerator Text Generation Pipeline

CAC has announced AI-powered business registration portal – thisdaylive

Subscribe to Updates

What's Hot

How it works

In the website UI

From the client SDK

I’m using Huggingface_hub from Python

From JS using @huggingface/Incerence

From an HTTP call

Request

Related Posts