DeepInfra on Hug Face Inference Provider 🔥

We are excited to share that DeepInfra is now a supported inference provider in Hugging Face Hub.

DeepInfra joins the growing ecosystem and powers the breadth and capabilities of serverless inference directly on the hub’s model page. Inference providers are also seamlessly integrated into client SDKs (both JS and Python), making it very easy to use different models with your preferred provider.

DeepInfra is a serverless AI inference platform that offers the most cost-effective per-token pricing in the industry. With a catalog of over 100 models, DeepInfra allows developers to easily integrate a wide range of AI capabilities into their applications with minimal setup.

DeepInfra supports a wide range of model types, from LLM to text-to-image, text-to-video, embedding, and more. As part of this initial integration, DeepInfra will begin supporting conversation and text generation tasks in Hugging Face, providing access to popular openweight LLMs such as DeepSeek V4, Kimi-K2.6, and GLM-5.1. Support for additional tasks (text to image conversion, text to video conversion, embedding, etc.) will also be rolled out soon.

To learn more about using DeepInfra as an inference provider, please visit our dedicated documentation page.

See here for a complete list of models supported by DeepInfra.

Follow DeepInfra on Hugging Face: https://huggingface.co/DeepInfra.

structure

Inside the website UI

User account settings allow you to: Set your own API key for the provider you signed up with. If no custom key is configured, requests are routed through HF. Order providers according to priority. This applies to widgets and code snippets on model pages.

As mentioned earlier, there are two modes when calling an inference provider: Custom Key (the call is sent directly to the inference provider using the corresponding inference provider’s own API key) Routed by HF (in that case no token from the provider is required and the charges are applied directly to your HF account and not to the provider’s account)

The model page introduces third-party inference providers that are compatible with your current model and sorted by your preferences.

From client SDK

DeepInfra is available through the Hugging Face SDK (huggingface_hub (>= 1.11.2) for Python and @huggingface/inference for JavaScript).

The following example shows how to use DeepSeek V4 Pro through DeepInfra. Authenticate using the Hugging Face token. Requests are automatically routed to DeepInfra.

From your favorite agent harness

The Hug Face Inference provider is integrated into most agent harnesses, including Pi, OpenCode, Hermes Agent, OpenClaw, and more. This means you can connect DeepInfra-hosted models directly to your favorite tools without the need for additional adhesive cords. See the full list of integrations here.

from python

import OS
from open night import OpenAI client = OpenAI(base_url=“https://router.huggingface.co/v1”api_key=os.environ(“HF_token”), ) completed = client.chat.completions.create(model=“Deep Seek-ai/Deep Seek-V4-Pro:deepinfra”message =( {
“role”: “user”,
“content”: “Create a Python function that returns the nth Fibonacci number using memoization.”
}),)

print(Complete.Choice(0). message)

From JS

import { OpenAI } from “Open night”;

constant Client = new OpenAI({
base url: “https://router.huggingface.co/v1”,
API key: Process.environment.HF_TOKEN});

constant Chat completed = wait client.chat.completion.create({
model: “Deep Seek-ai/Deep Seek-V4-Pro:deepinfra”,
message🙁 {
role: “user”,
content: “Create a Python function that returns the nth Fibonacci number using memoization.”}, ), });

console.log(Chat completed.choices(0).message);

make a claim

For direct requests, i.e. using keys from an inference provider, you will be charged by the corresponding provider. For example, when you use a DeepInfra API key, your DeepInfra account is charged.

For routed requests, that is, when you authenticate via Hugging Face Hub, you only pay standard provider API fees. There will be no additional price increases from us. It just passes the provider’s costs directly through. (In the future, we may enter into revenue sharing agreements with our provider partners.)

Important notes!! ️ PRO users get $2 worth of inference credits every month. Can be used across providers. 🔥

When you sign up for the Hugging Face PRO plan, you get access to inference credits, ZeroGPUs, Spaces Dev Mode, 20x limits, and more.

We also offer free inference with a small quota for signed-in free users, but please upgrade to PRO if possible.

Feedback and next steps

We would love to hear your thoughts. Please share your thoughts and comments here: https://huggingface.co/spaces/huggingface/HuggingDiscussions/Discussions/49

versatileai

See Full Bio

What's Hot

DeepInfra on Hug Face Inference Provider 🔥

How enterprise AI governance ensures profit margins

Guide to APIs, MCPs, and MCP Gateways

How enterprise AI governance ensures profit margins

Guide to APIs, MCPs, and MCP Gateways

GPT-5.5 is OpenAI’s most capable agent AI model to date

Soulgen revolutionizes the creation of NSFW content

Trump’s “big beautiful bill” could ban AI regulations

Diffuser welcomes Stable Diffusion 3.5 Large

Most Popular

Soulgen revolutionizes the creation of NSFW content

Trump’s “big beautiful bill” could ban AI regulations

Diffuser welcomes Stable Diffusion 3.5 Large

Don't Miss

DeepInfra on Hug Face Inference Provider 🔥

How enterprise AI governance ensures profit margins

Guide to APIs, MCPs, and MCP Gateways

Subscribe to Updates

What's Hot

DeepInfra on Hug Face Inference Provider 🔥

structure

Inside the website UI

From client SDK

From your favorite agent harness

from python

From JS

make a claim

Feedback and next steps

Related Posts