We are excited to share that DeepInfra is now a supported inference provider in Hugging Face Hub.
DeepInfra joins the growing ecosystem and powers the breadth and capabilities of serverless inference directly on the hub’s model page. Inference providers are also seamlessly integrated into client SDKs (both JS and Python), making it very easy to use different models with your preferred provider.
DeepInfra is a serverless AI inference platform that offers the most cost-effective per-token pricing in the industry. With a catalog of over 100 models, DeepInfra allows developers to easily integrate a wide range of AI capabilities into their applications with minimal setup.
DeepInfra supports a wide range of model types, from LLM to text-to-image, text-to-video, embedding, and more. As part of this initial integration, DeepInfra will begin supporting conversation and text generation tasks in Hugging Face, providing access to popular openweight LLMs such as DeepSeek V4, Kimi-K2.6, and GLM-5.1. Support for additional tasks (text to image conversion, text to video conversion, embedding, etc.) will also be rolled out soon.
To learn more about using DeepInfra as an inference provider, please visit our dedicated documentation page.
See here for a complete list of models supported by DeepInfra.
Follow DeepInfra on Hugging Face: https://huggingface.co/DeepInfra.
structure
Inside the website UI
User account settings allow you to: Set your own API key for the provider you signed up with. If no custom key is configured, requests are routed through HF. Order providers according to priority. This applies to widgets and code snippets on model pages.

As mentioned earlier, there are two modes when calling an inference provider: Custom Key (the call is sent directly to the inference provider using the corresponding inference provider’s own API key) Routed by HF (in that case no token from the provider is required and the charges are applied directly to your HF account and not to the provider’s account)

The model page introduces third-party inference providers that are compatible with your current model and sorted by your preferences.

From client SDK
DeepInfra is available through the Hugging Face SDK (huggingface_hub (>= 1.11.2) for Python and @huggingface/inference for JavaScript).
The following example shows how to use DeepSeek V4 Pro through DeepInfra. Authenticate using the Hugging Face token. Requests are automatically routed to DeepInfra.
From your favorite agent harness
The Hug Face Inference provider is integrated into most agent harnesses, including Pi, OpenCode, Hermes Agent, OpenClaw, and more. This means you can connect DeepInfra-hosted models directly to your favorite tools without the need for additional adhesive cords. See the full list of integrations here.
from python
import OS
from open night import OpenAI client = OpenAI(base_url=“https://router.huggingface.co/v1”api_key=os.environ(“HF_token”), ) completed = client.chat.completions.create(model=“Deep Seek-ai/Deep Seek-V4-Pro:deepinfra”message =( {
“role”: “user”,
“content”: “Create a Python function that returns the nth Fibonacci number using memoization.”
}),)
print(Complete.Choice(0). message)
From JS
import { OpenAI } from “Open night”;
constant Client = new OpenAI({
base url: “https://router.huggingface.co/v1”,
API key: Process.environment.HF_TOKEN});
constant Chat completed = wait client.chat.completion.create({
model: “Deep Seek-ai/Deep Seek-V4-Pro:deepinfra”,
message🙁 {
role: “user”,
content: “Create a Python function that returns the nth Fibonacci number using memoization.”}, ), });
console.log(Chat completed.choices(0).message);
make a claim
For direct requests, i.e. using keys from an inference provider, you will be charged by the corresponding provider. For example, when you use a DeepInfra API key, your DeepInfra account is charged.
For routed requests, that is, when you authenticate via Hugging Face Hub, you only pay standard provider API fees. There will be no additional price increases from us. It just passes the provider’s costs directly through. (In the future, we may enter into revenue sharing agreements with our provider partners.)
Important notes!! ️ PRO users get $2 worth of inference credits every month. Can be used across providers. 🔥
When you sign up for the Hugging Face PRO plan, you get access to inference credits, ZeroGPUs, Spaces Dev Mode, 20x limits, and more.
We also offer free inference with a small quota for signed-in free users, but please upgrade to PRO if possible.
Feedback and next steps
We would love to hear your thoughts. Please share your thoughts and comments here: https://huggingface.co/spaces/huggingface/HuggingDiscussions/Discussions/49

