Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

Performs inferences that provide privacy by hugging the face endpoint

June 10, 2025

Ryght’s journey to empower healthcare and life sciences with expert support from a hugging face

June 9, 2025

Benchmarking large-scale language models for healthcare

June 8, 2025
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Tuesday, June 10
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
Versa AI hub
Home»Tools»How to deploy and fine -tune the DeepSeek model with AWS
Tools

How to deploy and fine -tune the DeepSeek model with AWS

By January 30, 2025Updated:February 13, 2025No Comments7 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
Share
Facebook Twitter LinkedIn Pinterest Email

A running document that introduces how to expand and fine -tune the DeepSeek R1 model by embracing your face on AWS.

What is Deepseek-R1?

If you have had a hard time in a tough math problem, you know how much it is to think a little longer and work carefully. OPENAI’s O1 model is very good in solving inference tasks such as mathematics, coding, and logic if LLM is trained to do the same because LLM is inferred. I showed that I was there.

However, the recipe behind Openai’s reasoning model is secret. In other words, last week, DeepSeek released the Deepseek-R1 model and quickly broke the Internet (and stock market).

Deepseek AI has expanded Deepseek-R1, and six dense models distilled from Deepseek-R1 based on LLAMA and QWEN architecture, with open source and six dense models. You can find all of them in the Deepseek R1 collection.

In cooperation with Amazon Web Services, developers will develop the latest face models that embrace AWS services so that they can build better generated AI applications.

Hold the face of AWS and check how to develop and fine -tune the DeepSeek R1 model.

Deepseek R1 model will be developed

Embrace the endpoint of Face’s reasoning and expand it to AWS

The endpoint of the hugging face inference provides a simple and safe way to develop a machine learning model in a dedicated calculation for use in AWS. The reasoning endpoint enhances developers and data scientists so that AI applications can be created without managing infrastructure. Click the deployed process several times to simplify.

With the endpoint of inference, you can develop one of the six distillation models from Deepseek-R1. In addition, UNSLOTH: Https: //huggingface.co/unsloth/deepseek-r1-r1-GGUF, a quantized version of DeepSeek R1 can be developed. On the model page, click the deployment and click the endpoint of HF reasoning. Redirected to the endpoint page of the reasoning, we have selected the recommended hardware for executing optimized inference containers and models. Once you have created the endpoint, you can use AWS to send a query to Deepseek R1 for $ 8.3 per hour.

You can find DeepSeek R1 and distillation models, and other general open LLMS, and are ready to expand into an optimized configuration of the inference endpoint model catalog.

| Note: The team is working to enable the deployment of the DeepSeek model in the recommended instance. stay tuned!

Expanded on AmazonsageMakerai with Hugging Hugging Face LLM DLCS

GPU Deepseek R1

| Note: The team is working to enable Deepseek-R1 development using the Face LLM DLC embraced by the GPU. stay tuned!

GPU distillation model

Let’s walk the development of DeepSeek-R1-DISTILL-LLAMA-70B.

Code snippets can be used on the model page under the development button!

deploy_sagemaker_sdk.gif

Previously, let’s start with several prerequisites. Create a SUR, configure a sageMaker domain, set a sufficient assignment with sageMaker, and set a jupyterlab space. In the case of DeepSeek-R1-Distill-LLAMA-70B, it is necessary to increase the default assignment of ml.g6.48xlage for the use of Endpoint.

The hardware configuration recommended for each distilled variant for reference is the following:

Model Instance Type # GPU type Deepseek-R1-Distill-LLAMA-70B ML.G6.48xlarge 8 DeepSeek-AI/Deepseek-R1-Distill-32B 6.12xlarge 4 Deepseek -ai/Deepseek-R1-Distill-QWEN-14B ML.G6.12XLARGE 4 Deepseek-AI/Deepseek-R1-LLAMA-8B ML.G6.2XLARGE 1 -Distill-Qwen-7b ml.g6 .2XLARGE 1 Deepseek-AI/Deepseek-R1-Distill-Qwen-1.5b ML.G6.2XLARGE 1

When you enter the notebook, be sure to install the latest version of Sagemaker SDK.

! Pip Installation Sagemaker-Upgrade

Next, instances are used to determine the current area and the role of execution.

Import Json
Import Surge maker
Import BOTO3
from sageMaker.huggingface Import Huggingfacemodel, get_huggingface_llm_image_uri

try: Role = sageMaker.get_execution_role ()
Exclude ValueRror: IAM = boto3.client (“I”) Role = Iam.get_role (Rolename =“Sagemaker_execution_role”)“role”)“ARN”)

Create a sageMaker model object with Python SDK.

Model_id = “Deepseek-AI/DEEPSEEK-R1-DISTILL-LLAMA-70B”
Model_name = hf_model_id.split (“/”) ()1) .Lower () hub = {
“HF_MODEL_ID”: Model_id,
“Sm_num_gpus”: Json.dumps (8)} Huggingface_model = huggingfacemodel (image_uri = get_huggingface_llm_image_uri (“Hugging face”Version =“3.0.1”), ENV = Hub, role = role,)

Expand the model to the endpoint of the sage maker and test the endpoint.

Endpoint_name = f “{Model_name}-EP “

Predictor = huggingface_model.deplay (endpoint_name = Endpoint_name, initial_instance_count =1Instance_type =“Ml.g6.48xlarge”Container_startup_Health_check_timeout =2400,) Predictor.predict ({“input”: “What is the meaning of life?”})

That’s it, you have developed the LLAMA 70b Reasoning model!

Since the TGI V3 container is used under the hood, the most performance parameter of the specified hardware is automatically selected.

When the test is completed, delete the endpoint.

Predictor.delete_model () Predictor.delete_endpoint ()

Neuron distillation model

Walk the development of DeepSeek-R1-Distill-LLAMA-70B on Neuron instances, such as AWS Trainium 2 and AWS Ersentia 2.

Code snippets can be used on the model page under the development button!

deploy_neuron.gif

The prerequisite for expanding the neuron instance is the same. Configure the sageMaker domain, configure sufficient assignments in sageMaker, and make sure it has a Jupyterlab space. In the case of DeepSeek-R1-Distill-LLAMA-70B, it is necessary to increase the default assignment of ML.inf2.48xlarge for the use of Endpoint.

Next, instances are used to determine the current area and the role of execution.

Import Json
Import Surge maker
Import BOTO3
from sageMaker.huggingface Import Huggingfacemodel, get_huggingface_llm_image_uri

try: Role = sageMaker.get_execution_role ()
Exclude ValueRror: IAM = boto3.client (“I”) Role = Iam.get_role (Rolename =“Sagemaker_execution_role”)“role”)“ARN”)

Create a sageMaker model object with Python SDK.

Image_uri = get_huggingface_llm_image_uri (“Huggingface-Neuronx”Version =“0.0.25”) Model_id = “Deepseek-AI/DEEPSEEK-R1-DISTILL-LLAMA-70B”
Model_name = hf_model_id.split (“/”) ()1) .Lower () hub = {
“HF_MODEL_ID”: Model_id,
“HF_NUM_CORES”: “twenty four”,,
“HF_AUTO_CAST_TYPE”: “BF16”,,
“Max_batch_size”: “4”,,
“Max_input_tokens”: “3686”,,
“Max_total_tokens”: “4096”,}, Huggingface_model = huggingfacemodel (image_uri = images_uri, env = Hub, role = role,)

Expand the model to the endpoint of the sage maker and test the endpoint.

Endpoint_name = f “{Model_name}-EP “

Predictor = huggingface_model.deplay (endpoint_name = Endpoint_name, initial_instance_count =1Instance_type =“ML.inf2.48xlarge”Container_startup_Health_check_timeout =3600Volume_size =512,) Predictor.predict ({
“input”: “What is the capital of France?”,,
“parameter”: {
“Do_sample”: truth,,
“Max_new_tokens”: 128,,
“temperature”: 0.7,,
“Top_k”: 50,,
“Top_p”: 0.95,}})

That’s it, you have developed a Lama 70B inference model for neuron instance! Under the bonnet, I downloaded a compiled model in advance to embrace the face and speed up the start time of the endpoint.

When the test is completed, delete the endpoint.

Predictor.delete_model () Predictor.delete_endpoint ()

Expanded EC2 neurons using Hugging Face Neuron Deep Learning Ami

This guide will explain in detail how to export, deploy, and execute Deepseek-R1-Distill-Lallama-70B in the INF2.48xlarge AWS EC2 instance.

Previously, let’s start with several prerequisites. Make sure you have registered with the neurondy planning AMI with your face hugging in the market. Provides all dependencies required to train and deploy a face model with Trainium & IRSENTIA. Next, start the INF2.48xlarge instance with the AMI and connect it through SSH. If you have never been there, you can check the step -by -step guide.

Once connected via an instance, you can use the following command to deploy the model to the endpoint.

Docker Run -P 8080: 80 \ -V $ (Pwd)/Data \ \ -device =/dev/neuron1 \ -device =/dev/neuron2 \ -device =/neuron3 \ -DEV // Neu Ron4 \ -Device =/Dev/Neuron5 \ -device =/device =/device =/device =/device =/dev/neuron9 \ –Device Uron10 \ –Device =/Dev/EURON11 \ -E hf_batch_size = 4 \ -E hf_sequence_length = 4096 \ -E hf_auto_cast_type =“BF16” \ -Ehf_num_cores = 24 \ ghcr.io/huggingface/neuronx-test \ –Model-ID deepseek-AI/Deepseek-R1-LLAMA-70B IZE 4 \ -max -total- TOKENS 4096

It takes a few minutes to download the compiled model from the face cache hugging and start the TGI endpoint.

Next, you can test the endpoint.

CURL LOCALHOST: 8080/Generate \ -X Post \ -D ‘{“INPUTS”: “Why is the sky dark at night?”}’ \ -H ‘Content-Type: Application/json’

When the test is completed, pause the EC2 instance.

| Note: The team is working to enable the development of Deepseek R1 in Trainium & IRSENTIA, Face Neuron Deep Learning AMI. stay tuned!

Fine adjusting the DeepSeek R1 model

Fine adjustment with Amazon Sagemakerai to hug your face training DLC

| Note: The team embraces all DeepSeek models so that Face Training DLC ​​S can be fine -tuned. stay tuned!

Hugging Hugging Face Neuron Deep Arearning Fine adjustment of EC2 neuron

| Note: The team is working to make fine adjustments with Face Neuron Deep Learning Ami holding all DeepSeek models. stay tuned!

author avatar
See Full Bio
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous Article“White Colony”: Parent Israel’s AI account opposes Zionism
Next Article Chatgpt Gov aims to modernize US government agencies.

Related Posts

Tools

Performs inferences that provide privacy by hugging the face endpoint

June 10, 2025
Tools

Ryght’s journey to empower healthcare and life sciences with expert support from a hugging face

June 9, 2025
Tools

Benchmarking large-scale language models for healthcare

June 8, 2025
Add A Comment
Leave A Reply Cancel Reply

Top Posts

Deepseek’s latest AI model is a “big step back” for free speech

May 31, 20255 Views

Doudna Supercomputer to Strengthen AI and Genomics Research

May 30, 20255 Views

From California to Kentucky: Tracking the rise of state AI laws in 2025 | White & Case LLP

May 29, 20255 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

Deepseek’s latest AI model is a “big step back” for free speech

May 31, 20255 Views

Doudna Supercomputer to Strengthen AI and Genomics Research

May 30, 20255 Views

From California to Kentucky: Tracking the rise of state AI laws in 2025 | White & Case LLP

May 29, 20255 Views
Don't Miss

Performs inferences that provide privacy by hugging the face endpoint

June 10, 2025

Ryght’s journey to empower healthcare and life sciences with expert support from a hugging face

June 9, 2025

Benchmarking large-scale language models for healthcare

June 8, 2025
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2025 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?