I’m excited to share that Cohere is a supported reasoning provider for HF Hub! This marks the first modeler who shares and delivers models directly in the hub.
Cohere is committed to building and servicing models built for enterprise use cases. From cutting-edge generator AI to powerful embedded and ranking models, their comprehensive suite of AI solutions is designed to tackle real business challenges. Additionally, Cohere Labs at House Research Lab is looking to support basic research and change the space in which research is conducted.
Now you can perform serverless inference to the following models via Cohere and inference providers:
Light up your projects today at Colles Lab and Collie Lab!
Coop model
Cohere and Cohere Labs bring swaths of models to inference providers that excel in specific business applications. Let’s explore some in detail.
Coherelabs/c4ai-command-a-03-2025🔗
Optimized for demands that require fast, secure AI. The length of the 256K context (a major model twice as large) can handle much longer enterprise documents. Other important features include Cohere’s Advanced Searched Generation (RAG) with verifiable citations, the use of agent tools, enterprise-grade security, and powerful multilingual performance (supporting 23 languages).
COHERELABS/AYA-EXPANSE-32B🔗
Less resource languages focus on cutting-edge multilingual support. Arabic, Chinese (simplified, traditional), Czech, Dutch, English, French, German, Greek, Hebrew, Hebrew, Hindi, Indonesian, Italian, Japanese, Japanese, Korean, Persian, Polish, Portuguese, Romanian, Russian, Spanish, Turkish, Ukrainian, Vietnamese lengthens the 128k context.
Coherelabs/c4ai-command-r7b-12-2024🔗
Perfect for low-cost or low-rise use cases, bringing cutting-edge performance in the class of open-weight models across real-world tasks. This model provides a context length of 128k. It provides a powerful combination of multilingual support, searched generation of citation validation (RAG), inference, tool use, and agent behavior. Multilingual model trained in 23 languages
Coherelabs/aya-vision-32b🔗
A 32 billion parameter model with advanced features optimized for a variety of visual language use cases, including OCR, captions, visual inference, summaries, question answers, codes, and more. The multimodal functionality will be expanded to 23 languages spoken by more than half of the world’s population.
How it works
You can use the Cohere model directly in the hub via the website UI or the client SDK.
You can find all the examples explained in this section on the Cohere documentation page.
In the website UI
You can search for Cohere models by filtering by the model hub inference provider.
From the model card, you can select an inference provider and execute inference directly in the UI.
From the client SDK
Let’s walk using the Cohere model of the client SDK. I also created a colab notebook with these snippets in case I want to try it out right away.
I’m using Huggingface_hub from Python
The following example shows how to use the command that uses Cohere as the inference provider: For automatic routing through a hugging face, you can use a hugging face token.
Install huggingface_hub v0.30.0 or later:
Pip Install-u “huggingface_hub> = 0.30.0”
Use the Huggingface_hub Python library to call Cohere Endpoints with provider parameters defined.
from huggingface_hub Import Inference client=Inference client(provider=“Collie”,api_key =「xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx) Message = ({
“role”: “user”,
“content”: “How do you make a very spicy mayonnaise?”
}) complete = client.chat.completions.create(model =“Coherelabs/c4ai-command-r7b-12-2024”message = message, temperature =0.7,max_tokens =512,)
printing(complete.choices)0). message)
Multilingual and multimodal models from Aya Vision and Cohere Labs are also supported. You can include base64 encoded images as follows:
image_path = “img.jpg”
and open(image_path, “RB”)) As F:base64_image =base64.b64encode(f.read()). Decode (“UTF-8”)image_url = f “Data: image/jpeg; base64,{base64_image}“
from huggingface_hub Import Inference client=Inference client(provider=“Collie”,api_key =「xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx) Message = ({
“role”: “user”,
“content”:({
“type”: “Sentence”,
“Sentence”: “What is in this image?”
},{
“type”: “Image_url”,
“Image_url”:{“URL”:image_url}, },)})compley = client.chat.completions.create(model =“Coherelabs/aya-vision-32b”message = message, temperature =0.7,max_tokens =512,)
printing(complete.choices)0). message)
From JS using @huggingface/Incerence
Import { hfinference } from “@Huggingface/Inference”;
const Client= new hfinference(「xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx);
const chatcompletion = wait client.ChatCompletion({
Model: “Coherelabs/c4ai-command-a-03-2025”,
message:({
role: “user”,
content: “How do you make a very spicy mayonnaise?”
}),
Provider: “Collie”,
max_tokens: 512
});
console.log(ChatCompletion.Choices(0).message);
From the Openai client
Here’s how to invoke command R7B using Cohere via Openai client library:
from Openai Import OpenAI client = openai(base_url =“https://router.huggingface.co/cohere/compatibility/v1”,api_key =「xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx) Message = ({
“role”: “user”,
“content”: “How do you make a very spicy mayonnaise?”
}) complete = client.chat.completions.create(model =“Command A-03-2025”message = message, temperature =0.7,)
printing(complete.choices)0). message)
Using tools in Cohere models
Cohere’s model brings the use of cutting-edge agent tools to inference providers, so take a closer look. Both the Hugging Face Hub client and the Openai client are compatible with the tool via inference provider, so you can extend the example above.
First, you need to define the tools that use the model. Below we define get_flight_info, which uses two locations to invoke the latest flight info API: This tool definition is represented in the chat template for the model. You can also explore it with a model card (Open Source).
Tool = ({
“type”: “function”,
“function”:{
“name”: “get_flight_info”,
“explanation”: “Get flight information between two cities or airports”,
“parameter”:{
“type”: “object”,
Properties:{
“loc_origin”:{
“type”: “string”,
“explanation”: “Departure airport, for example Mia.”,},
“loc_destination”:{
“type”: “string”,
“explanation”: “Destination airports, for example NYC.”,},},
“Required”🙁“loc_origin”, “loc_destination”), }, }, })
The model then needs to pass a message to the model to use the tool if it is relevant. The following example defines the tool call for the assistant Tool_Calls for clarity.
Message = ({“role”: “Developer”, “content”: “Today is April 30th.”},{
“role”: “user”,
“content”: “When is your next flight from Miami to Seattle?”,},{
“role”: “assistant”,
“Tool_Calls”:({
“function”:{
“Discussion”: ‘{“loc_destination”: “Seattle”, “loc_origin”: “miami”}’,
“name”: “get_flight_info”,},
“ID”: “get_flight_info0”,
“type”: “function”}), }, {
“role”: “tool”,
“name”: “get_flight_info”,
“tool_call_id”: “get_flight_info0”,
“content”: “From Miami to Seattle, May 1st, 10am.”,},)
Finally, the tools and messages are passed to the method of writing.
from huggingface_hub Import Inference client=Inference client(provider=“Collie”,api_key =「xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,) complete = client.chat.completions.create(model =“Coherelabs/c4ai-command-r7b-12-2024”message = message, tool = tool, temperature =0.7,max_tokens =512,)
printing(complete.choices)0). message)
Request
For direct requests, i.e. using a Cohere key, you will be billed directly to your Cohere account.
For routed requests, i.e. when authenticating through a hub, you only pay the standard Cohere API rate. There is no additional markup. Pass the provider’s costs directly. (In the future, we may establish a revenue sharing agreement with our provider partners.)
Important Memopia users get $2 worth of inference credits each month. You can use them between providers. 🔥
Subscribe to our Hugging Face Pro plan for access to inference credits, Zerogpu, Spaces Dev Mode, 20x high limits and more.