We are excited to share that OVHcloud is now a supported inference provider in Hugging Face Hub. OVHcloud joins a growing ecosystem to expand the breadth and capabilities of serverless inference directly on the hub’s model page. Inference providers are also seamlessly integrated into client SDKs (both JS and Python), making it very easy to use different models with your preferred provider.
With this release, it’s now easier than ever to access popular openweight models like gpt-oss, Qwen3, DeepSeek R1, and Llama directly from Hugging Face. You can browse the OVHcloud organization on the hub at https://huggingface.co/ovhcloud and try out trends for supported models at https://huggingface.co/models?inference_provider=ovhcloud&sort=trending.
OVHcloud AI Endpoint is a fully managed serverless service that provides access to frontier AI models from leading research labs through simple API calls. The service offers competitive pay-per-token prices starting from €0.04 per million tokens.
The service runs on secure infrastructure located in European data centers, ensuring data sovereignty and low latency for European users. The platform supports advanced features such as structured output, function calls, and multimodal capabilities for both text and image processing.
OVHcloud’s inference infrastructure is built for production environments, with first token response times of less than 200ms, making it ideal for interactive applications and agent workflows. This service supports both text generation and embedding models. More information about OVHcloud’s platform and infrastructure can be found at https://www.ovhcloud.com/en/public-cloud/ai-endpoints/catalog/.
To learn more about using OVHcloud as an inference provider, please visit our dedicated documentation page.
Click here for a list of supported models.
structure
Inside the website UI
User account settings allow you to:
Set your own API key for the provider you signed up with. If no custom key is configured, requests are routed through HF. Order providers according to priority. This applies to widgets and code snippets on model pages.

As mentioned earlier, there are two modes of invoking an inference provider.
Custom key (calls are sent directly to the inference provider using the corresponding inference provider’s own API key) Routed by HF (in that case no token from the provider is required and the charges are applied directly to your HF account and not to the provider’s account)

The model page introduces third-party inference providers that are compatible with your current model and sorted by your preferences.

From client SDK
From Python, use huggingface_hub
The following example shows how to use OpenAI’s gpt-oss-120b with OVHcloud as the inference provider. You can use Hugging Face tokens for automatic routing through Hugging Face, or you can use your own OVHcloud AI endpoint API key (if you have one).
Note: This requires using the latest version of huggingface_hub (>= 1.1.5).
import OS
from hug face hub import InferenceClient client = InferenceClient( api_key=os.environ(“HF_token”), ) completed = client.chat.completions.create(model=“openai/gpt-oss-120b:ovhcloud”message =( {
“role”: “user”,
“content”: “Where is the capital of France?”
}),)
print(Complete.Choice(0). message)
From JS using @huggingface/inference
import { inference client } from “@hugface/reasoning”;
constant Client = new inference client(process.environment.HF_TOKEN);
constant Chat completed = wait client.Chat completed({
model: “openai/gpt-oss-120b:ovhcloud”,
message🙁 {
role: “user”,
content: “Where is the capital of France?”}, ), });
console.log(Chat completed.choices(0).message);
make a claim
Here’s how billing works:
For direct requests, i.e. using keys from an inference provider, you will be charged a fee by the corresponding provider. For example, if you use an OVHcloud API key, your OVHcloud account will be charged.
For routed requests, that is, when authenticating via Hugging Face Hub, you only pay standard provider API fees. There will be no additional price increases from us. It just passes the provider’s costs directly through. (In the future, we may enter into revenue sharing agreements with our provider partners.)
Important notes!! ️ PRO users get $2 worth of inference credits every month. Can be used across providers. 🔥
When you sign up for the Hugging Face PRO plan, you get access to inference credits, ZeroGPUs, Spaces Dev Mode, 20x limits, and more.
We also offer free inference with a small quota for signed-in free users, but upgrade to PRO if possible.
Feedback and next steps
We would love to hear your thoughts. Please share your thoughts and comments here: https://huggingface.co/spaces/huggingface/HuggingDiscussions/Discussions/49

