Easy to train your model using H100 GPU on nvidia dgx cloud

Update: This service has been deprecated and is no longer available as of April 10, 2025.

Today, we are excited to announce the launch of DGX Cloud Train, a new service on Hug’s Face Hubs available to enterprise hub organizations. Training in the DGX Cloud makes it easy to use open models with the accelerated computing infrastructure of the NVIDIA DGX Cloud. Together, we’ll build trains in the DGX cloud to allow enterprise hub users to easily access the latest NVIDIA H100 tensor core GPUs and fine-tune common generation AI models such as Llama, Mistral, and stable diffusion in just a few clicks within the hagging face hub.

The GPU is already poor

This new experience expands the strategic partnership announced last year to simplify the training and deployment of open-generated AI models for NVIDIA accelerated computing. One of the main issues facing developers and organizations is the scarcity of GPU availability and the time-consuming task of writing, testing and debugging training scripts for AI models. Train with DGX Cloud provides a simple solution to these challenges, providing instant access to Nvidia GPUs starting from the H100 on Nvidia DGX Cloud. Plus, the train with DGX Cloud offers a simple, no-code training job creation experience by hugging the face space with the Face Auto Terrain.

Enterprise Hub organizations can provide teams with instant access to powerful NVIDIA GPUs, with fees only per minute for the calculated instances used for training jobs.

“Training in the DGX cloud is the easiest, fastest and most accessible way to train generative AI models, combining powerful GPUs, pay-as-you-go, and instant access to no-code training. “It’s a game-changer for data scientists everywhere!”

“We’re committed to providing a range of services to our customers,” said Alexis Bjorlin, Vice President of NVIDIA’s DGX Cloud. “By integrating Nvidia’s AI supercomputers into the cloud and embracing Face’s user-friendly interface, organizations can accelerate AI innovation.”

How it works

Training to hug face models on the Nvidia DGX cloud has never been easier. Below is a step-by-step tutorial for fine-tuning Mistral 7b.

Note: To use training in the DGX cloud, you must access an organization that hugs your Face Enterprise subscription

You can find the DGX Cloud train on the supported Generated AI Models model page. Currently supports the following model architectures: Llama, Falcon, Mistral, Mixtral, T5, Gemma, Stable diffusion, and Stable diffusion XL.

Open the Train menu and select nvidia dgx cloud. This opens an interface where you can select your enterprise organization.

Next, click Create New Space. If you are using the train for the first time with DGX Cloud, the service creates new embracing face spaces within your organization, allowing you to create training jobs that run in the NVIDIA DGX cloud using AutoTrain. If you want to create another training job later, you will be automatically redirected to an existing autoterrain space.

Once you reach the Autoterrain space, you can create a training job by configuring the hardware, base model, tasks, and training parameters.

The hardware allows you to choose between Nvidia H100 GPUs that are available on 1x, 2x, 4x, 8x instances, or L40S GPUs (coming soon). The training dataset must be uploaded directly to the Upload Training Files area. CSV and JSON files are currently supported. Follow the example below to ensure that the column mapping is correct. For training parameters, you can directly edit the JSON configuration on the right. For example, you can change the number of epochs from 3 to 2.

Once everything is set up, you can start training by clicking “Start Training”. AutoTrain validates the dataset and asks you to confirm the training.

You can monitor your training by opening the “log” in the space.

Once the training is complete, the tweaked model will be uploaded to a new private repository within the selected namespace of the facehub you want to hug.

Training in DGX Cloud is available to all enterprise hub organizations today! Try our service and let us know your feedback!

DGX Cloud train prices

Train usage in the DGX cloud is billed by the moment of the GPU instance used during training work. The current price for training jobs is $8.25 per GPU hour for H100 instances and $2.75 per GPU hour for L40S instances. Once the job is completed, you will be charged a fee for your Enterprise Hub organization’s current monthly billing cycle. You can view current and past usage at any time within your Enterprise Hub organization’s billing environment.

For example, a fine-tuning Mistral 7B for 1500 samples of one NVIDIA L40 takes ~10 minutes and costs $0.45.

I’ve just started

We are excited to work with NVIDIA to democratize accelerated machine learning across open science, open source and cloud services.

Open Science collaboration via Bigcode now allows for training on the Starcoder 2 15B.

Our collaboration in open source fuels the new Optimum-Nvidia library, accelerates LLMS inference on the latest NVIDIA GPUs, and has already achieved 1,200 tokens on the Llama 2.

Our collaboration on cloud services created a train for DGX Cloud today. We are also working with Nvidia to optimize inference, make it more accessible to the embracing face community of accelerated computing and leverage collaborations on Nvidia Tensort-llm and Optimum-nvidia. Additionally, some of the most popular open models on Hugging Face are found in the Nvidia Nim Microservices, which was announced today at GTC.

Anyone participating in this week’s GTC should watch Session S63149 on Wednesday, March 20th at 3pm PT. Jeff leads trains such as DGX Cloud. Also, don’t miss the next hugcast to perform a live train demo on DGX Cloud and ask Abhishek and Rafael directly on 3/21, 9am Pt/12pm ET/17H CET.

versatileai

See Full Bio

What's Hot

Vibe analytics to easily uncover data insights

County attorney to draft bill after debate over state law options for local landfill approvals

🤗 Overview of quantization schemes natively supported by Transformers

Vibe analytics to easily uncover data insights

🤗 Overview of quantization schemes natively supported by Transformers

Samsung’s tiny AI model defeats giant inference LLM

The whimsical loyalty of high-tech oligarchs strengthens the disastrous need for US data privacy, competition and AI law

Adds AI tools for on-demand video creation to Google TV sets

Google aims to put an AI agent on every desk

Most Popular

The whimsical loyalty of high-tech oligarchs strengthens the disastrous need for US data privacy, competition and AI law

Adds AI tools for on-demand video creation to Google TV sets

Google aims to put an AI agent on every desk

Don't Miss

Vibe analytics to easily uncover data insights

County attorney to draft bill after debate over state law options for local landfill approvals

🤗 Overview of quantization schemes natively supported by Transformers

Subscribe to Updates

What's Hot

Easy to train your model using H100 GPU on nvidia dgx cloud

The GPU is already poor

How it works

DGX Cloud train prices

I’ve just started

Related Posts