Today, we have started integrating four wonderful serverless providers, FAL, Replicate, Sambanova, and AI, directly on the hub model page. In addition, since it is seamlessly integrated with client SDK (for JS and Python), it is easier to search for a variety of serverless inference on a wide variety of models running on your favorite provider.
I have been hosting the serverless inference API for a long time (I started V1 in the summer of 2020. This has made simple exploration and prototyping, but a large dataset with the community. At the same time, the serverless provider has been improved to provide simple and unified access to serverless providers, with the collaboration, storage, version, and distribution. It was appropriate to hug your face.
In cooperation with excellent partners such as AWS and NVIDIA, we will partner with the next -generation serverless inference providers for model -centered serverless inference, as we are looking for a dedicated deployment option via the model page development button. That was natural.
What this is possible is a timely example of Deepseek-AI/Deepseek-R1, a model that has achieved mainstream reputation in the past few days 🔥::
Rodrigo Liang, a co -founder and CEO of Sambanova: “We are excited to be holding a face to accelerate the reasoning API.
ZEKE SIKELIANOS, the founder of Replicate: “Hugging Face is a de facto platform of the weight of the open source model, and it was an important player to make AI more accessible to the world. I am honored to be one of the first inference providers introduced in this launch. “
This is just the beginning, and it will be built in the community in the next few weeks!
How it works
On the website UI
In the user account settings, set your own API key to the sign -up provider. Otherwise, you can use them. The request is routed via HF. Order a provider according to your preference. This applies to the widgets and code snippets of the model page.
As mentioned above, there are two modes when calling the inference API. Custom keys (calls are moved directly to the reasoning provider using the unique API key of the corresponding inference provider). Or it is routed by HF (in that case, a token from the provider is required, and the fee is applied directly to the HF account, not the provider account).
The model page introduces a third -party inference provider (compatible with the current model sorted according to the user’s preference).
From the client SDK
From Python, I use Huggingface_Hub
The following example shows how to use AI as a reasoning provider to use Deepseek-R1. For automatic routing through hugging faces, you can use a hugging face token, or if you have your own AI API key, you can use the AI API key together.
Huggingface_Hub version V0.28.0 or later (release notes).
from Huggingface_hub Import Progress Client = Proseror Client (Provider =“together”API_KEY =“XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
) Message = ({
“role”: “user”,,
“content”: “What is the capital of France?”
}) CompLeTe = client.chat.Completions.create (model =“Deepseek-AI/Deepseek-R1”Message = Message, Max_tokens =500
)
printing(Complete.choices ()0) message)
Note: You can also call for inference providers using the Openai client library. See the example of the DeepSeek model.
And how to generate images from text prompts using flux.1-dev is as follows.
from Huggingface_hub Import Progress Client = Proseror Client (Provider =“Faluai”API_KEY =“XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
IMAGE = Client.Text_to_image (
“Bermail style Labrador”Model =“BLACK-FOREST-LABS/FLUX.1-DEV”
)
To move to another provider, you can simply change the provider name. Everything else remains the same.
Huggingface_Hub to Import IMPERENCECLIENT CLIENT = ISMERENCECLIENT (
-PROVIDER = “FAL-AI”,
+ Provider = “Replicate”,
api_key = “xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
From JS using @huggingface/Inceration
Import { hfinference } from “@Huggingface/Inference”;
consist Client = new hfinference(“XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX);
consist ChatCompletation = wait client.ChatCompletion({{
Model: “Deepseek-AI/Deepseek-R1”,,
message: ({
role: “user”,,
content: “What is the capital of France?”
}),
Provider: “together”,,
max_tokens: 500
});
console。log(ChatCompletion.Choice(0)message);
From the http call
Huggingface.co can be called directly to expose the routing proxy under the domain. For example, it is very useful for Openai compatible APIs. URL can be exchanged as a base URL: https: //huggingface.co/api/inference-proxy/ {: Provider}.
The method of using Sambanova to call LLAMA-3.3-70B-Instruct as a reasoning provider via CURL is as follows:
curl ‘https://huggingface.co/api/inference-proxy/sambanova/v1/chat/completions’ \ -H ‘Approval: Bearer xxxxxxxxxxxxxxxxx’ \ -H ‘Content-Type: Application/json’ \ – data ‘{
“Model”: “Metalama /llama-3.3-70B-Instruct”,
“message”:(
{
“Role”: “User”,
“Content”: “What is the capital of France?”
}
), …
“Max_tokens”: 500,
“Stream”: Fake
} ‘
Claim
In the case of a direct request, that is, when a key is used from a reasoning provider, a corresponding provider is charged. For example, if you use the AI key together, you will be charged in the AI account.
In the case of a routing request, that is, when authenticated through the hub, only the standard provider API rate is paid. There is no additional markup. Pass the provider’s cost directly. (In the future, we may establish a profit -shared contract with the provider partner.)
Important Memopia users have a dollar -equivalent credits per month. You can use them between providers. 🔥
Please subscribe to the Hugging Face Pro plan to access the reasoning credit, ZEROGPU, Spaces Dev Mode, 20 times the limit.
In addition, we will infer small allocation for sign -in -free users for free, but upgrade to Pro if possible!
Feedback and the next step
We want to get your feedback! https:.