
Following the recent announcement on Hub’s inference provider, we are excited to share the fireworks.
fireworks.ai directly provides fiery serverless inference across the HF ecosystem of model pages and libraries and tools, making inferences to your favorite models easier than ever to execute.
Of course, from now on, serverless inference can be performed on the following models via fireworks.
More lists can be found here for the complete list.
Light up your project with today’s fireworks.
How it works
In the website UI
Find all models supported for fireworks in HF here.
From the client SDK
I’m using Huggingface_hub from Python
The following example shows how to use Fireworks.ai as an inference provider: Automatic routing through a hugging face can be used with a hugging face token, or if you have your own fireworks, you can use your own.
Install huggingface_hub from source:
PIP Install git+https://github.com/huggingface/huggingface_hub
Use the Huggingface_Hub Python library to define provider parameters and call the Fireworks.ai endpoint.
from huggingface_hub Import Inference client=Inference client(provider=“firework”,api_key =ćxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
) Message = ({
“role”: “user”,
“content”: “What is the capital of France?”
}) complete = client.chat.completions.create(model =“deepseek-ai/deepseek-r1”message = message, max_tokens =500
))
printing(complete.choices)0). message)
From JS using @huggingface/Incerence
Import { hfinference } from “@Huggingface/Inference”;
const Client= new hfinference(ćxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx);
const chatcompletion = wait client.ChatCompletion({
Model: “deepseek-ai/deepseek-r1”,
message:({
role: “user”,
content: “How do you make a very spicy mayonnaise?”
}),
Provider: “firework”,
max_tokens: 500
});
console.log(ChatCompletion.Choices(0).message);
From an HTTP call
Here’s how to call Llama-3.3-70B-Instruct as an inference provider via Curl using Fireworks.ai:
curl ‘https://router.huggingface.co/fireworks-ai/v1/chat/completions’ \ -H ‘Authorization: Bearer xxxxxxxxx’ \ -H ‘Content-Type: application/json’ \ –data ‘{ ” Model “:” accounts/fireworks/models/llama-v3-70b-instruct “, “messages”: ({” role “:” user “, ” content “:” what does life mean if you were a dog? Is it? “}),” max_tokens “:500, “stream”:false}’
Request
If you make a direct request, that is, if you use a fireworks key, you will be billed directly to your fireworks account.
For routed requests, i.e. when authenticating through the hub, you only pay the standard fireworks API rate. There is no additional markup. Pass the provider’s costs directly. (In the future, we may establish a revenue sharing agreement with our provider partners.)
Important Memopia users get $2 worth of inference credits each month. You can use them between providers. š„
Subscribe to our Hugging Face Pro plan for access to inference credits, Zerogpu, Spaces Dev Mode, 20x high limits and more.