Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

Utah has enacted AI fixes targeting mental health chatbots and generation AI | Sheppard Mullin Richter & Hampton LLP

May 19, 2025

The growing issues regarding social media AI

May 19, 2025

Introducing the Hebrew LLMS open leaderboard!

May 19, 2025
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Monday, May 19
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
Versa AI hub
Home»Tools»Embracing face and serverless reasoning by nvidia nim
Tools

Embracing face and serverless reasoning by nvidia nim

versatileaiBy versatileaiMarch 29, 2025No Comments5 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
#image_title
Share
Facebook Twitter LinkedIn Pinterest Email




Today, we are excited to announce the launch of the Hug Face Nvidia Nim API (ServerLess), a new Hug Face Hub service available to enterprise hub organizations. This new service makes it easy to use open models with services such as Accelerated Compute Platform and NVIDIA DGX Cloud Accelerated Compute Platform for Throughterversion. You can build this solution to easily access the latest NVIDIA AI technologies in a serverless way for enterprise hub users to perform inference on popular generated AI models such as Llama and Mistral using standardized APIs and several lines of code within the hagging face hub.

Thumbnail

Serverless reasoning with nvidia nim

This new experience simplifies access and use of open-generated AI models in NVIDIA accelerated computing, based on its collaboration with NVIDIA. One of the main challenges faced by developers and organizations is the upfront costs of infrastructure and the complexity of optimizing LLM inference workloads. By hugging the Face Nvidia Nim API (ServerLess), you can provide a simple solution to these challenges, providing instant access to cutting-edge, open-generated AI models optimized for the NVIDIA infrastructure. The pay-as-you-go pricing model is an economical option for businesses of all sizes, as it allows you to pay only the request time you use.

The NVIDIA NIM API (serverless) complements training from DGX Cloud, an AI training service already available on Face.

How it works

Performing serverless inference using a hugging face model has never been easier. Here’s a step-by-step guide to getting you started:

Note: You must access an organization with a hugging hugging Face Enterprise Hub subscription to perform inference.

Before you begin, make sure you meet the following requirements:

You are a member of an Enterprise Hub organization. You have created fine-grained tokens for your organization. Follow the steps below to create a token:

Create a fine-grained token

Fine grained tokens allow users to create tokens with specific permissions to access resources and namespaces accurately. First, hug the face access token, click “Create new token” and select “Fine tweak.”

Create a token

Enter the Token Name, select the Enterprise Organization as the scope under ORG Permissions, and click (Create Token). There is no need to select any additional scopes.

Scope token

Next, save this token value to authenticate the request later.

Find your nim

You can find “nvidia nim api (serverless)” in the model page for supported generated AI models. You can find all supported models in this NVIDIA NIM Collection and Pricing section.

Use Metalama/Metalama-3-8B-instruct. Open the Metalama/Metalama-3-8b-instruct model card and open the “Expand” menu and select “nvidia nim api (serverless).”

Reasoning - Modal

Submit a request

The nvidia nim api (serverless) is standardized in the Openai API. This allows you to use Openai’s SDK for inference. Replace your_fine_graine_token_here with a fine grain token and you’re ready to perform the inference.

from Openai Import OpenAI client = openai(base_url =“https://huggingface.co/api/integrations/dgx/v1”,api_key =“your_fine_graine_token_here”
)chat_completion = client.chat.completions.create(model =“Metalama/Metalama-3-8b-instruct”message = ({“role”: “system”, “content”: “You’re a kind assistant.”},{“role”: “user”, “content”: “Count 500”}), stream =truth,max_tokens =1024
))

for message in chat_completion:
printing(message.choices)0).delta.content, end =”))

Congratulations! You can now start building a generated AI application using the open open model. 🔥

nvidia nim api (serverless) currently only supports chat.completions.create and models.list api. I’m working on extending this while adding models. You can use Models.list to see which models are currently available for inference.

Model = client.models.list()
for m in Model. data:
printing(m.id))

Supported models and pricing

The usage of Face Nvidia Nim API (serverless) hugging usage is billed based on the calculation time used per request. It uses only an Nvidia H100 tensor core GPU and costs $8.25 per hour. To make this easier to understand with per-request pricing, you can convert this to per-second.

$8.25 per hour = $0.0023 per second (below rounded decimal point)

The total cost of a request depends on the model size, the number of GPUs required, and the time it takes to process the request. Here is a breakdown of current model provision, GPU requirements, typical response times, and estimated costs per request:

Model ID NVIDIA H100 Typical response time for number of GPUs (500 input tokens, 100 output tokens) Estimated cost per request Metalama/Metalama-3-8b-instruct 1 1 second $0.0023 Metalama/Metalama-3-70B-intract 2 2 seconds $0.018444 Metalama/Metalama-3.1-405b-instruct-fp8 8 5 seconds $0.0917

The royalties accrue during the Enterprise Hub organization’s current monthly billing cycle. You can view current and past usage at any time within your Enterprise Hub organization’s billing environment.

Supported models

Accelerating AI inference with nvidia tensort-llm

We are pleased to continue our collaboration with Nvidia to push the boundaries between AI inference performance and accessibility. A key focus of our ongoing efforts is to embrace the Nvidia Tensorrt-LLM library and integrate it into Face’s Text Generation Inference (TGI) framework.

We will share details, benchmarks and best practices for using TGI with Nvidia Tensort-llm in the near future. Stay tuned for even more exciting developments as we continue to expand our collaboration with NVIDIA and continue to bring powerful AI capabilities to developers and organizations around the world!

author avatar
versatileai
See Full Bio
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleAdobe Express integrates with Marketo Engage to enhance AI-powered content creation
Next Article Rebranding AI-powered media-in-housing platform – Marketing Communications News
versatileai

Related Posts

Tools

Introducing the Hebrew LLMS open leaderboard!

May 19, 2025
Tools

Subscribe to Enterprise Hub with your AWS account

May 19, 2025
Tools

Building cost-effective enterprise RAG applications using Intel Gaudi 2 and Intel Xeon

May 18, 2025
Add A Comment
Leave A Reply Cancel Reply

Top Posts

The UAE announces bold AI-led plans to revolutionize the law

April 22, 20253 Views

The UAE will use artificial intelligence to develop new laws

April 22, 20253 Views

New report on national security risks from weakened AI safety frameworks

April 22, 20253 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

The UAE announces bold AI-led plans to revolutionize the law

April 22, 20253 Views

The UAE will use artificial intelligence to develop new laws

April 22, 20253 Views

New report on national security risks from weakened AI safety frameworks

April 22, 20253 Views
Don't Miss

Utah has enacted AI fixes targeting mental health chatbots and generation AI | Sheppard Mullin Richter & Hampton LLP

May 19, 2025

The growing issues regarding social media AI

May 19, 2025

Introducing the Hebrew LLMS open leaderboard!

May 19, 2025
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2025 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?