Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

Lossless compression tailored to AI

June 30, 2025

Easy to train your model using H100 GPU on nvidia dgx cloud

June 30, 2025

Best Pytorch Quantization Backend

June 29, 2025
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Monday, June 30
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
Versa AI hub
Home»Tools»Deepseek-R1 completely open duplicate
Tools

Deepseek-R1 completely open duplicate

By January 29, 2025Updated:February 13, 2025No Comments5 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
Share
Facebook Twitter LinkedIn Pinterest Email




Louistan stall avatar


If you have had a hard time in a tough math problem, you know how much it is to think a little longer and work carefully. OPENAI’s O1 model is very good in solving inference tasks such as mathematics, coding, and logic if LLM is trained to do the same because LLM is inferred. I showed that I was there.

However, the recipe behind Openai’s reasoning model is secret. In other words, last week, DeepSeek released the Deepseek-R1 model and quickly broke the Internet (and stock market).

The release of Deepseek-R1 has a detailed technical report that outlines important steps in training recipes, as well as O1. This recipe includes some innovation, especially without human supervision, to apply pure enhanced learning to teach how to infer a basic language model. As shown in the figure below, it is very easy to create a powerful inference model if you can access a competent base model and high quality data.

Deepseek-R1 Training Pipeline

However, the release of Deepseek-R1 opens some questions:

Data Collection: How was the specific dataset curated? Model Training: DeepSeek has not released the training code, so it is unknown which hyper parameters are the most effective and different models and scales. Scaling method: What is the calculation and data trade -off in the training test model?

Open-R1 Project is an initiative that systematically reconstructs the data and training pipeline of Deepseeek-R1, verifies the claim, and spreads the border of the Open Reasoning model. By building Open-R1, reinforced learning enhances inference, shares the reproducible insight with the open source community, and provides transparency on how to create a foundation for future models that utilize these technologies. I am aiming for that.

See the important ingredients behind Deepseek-R1 for this blog post. See the parts to be reproduced and how to contribute to the Open-R1 project.

Let’s jump into 🚀!

How did they do it?

Deepseek-R1 is a reasoning model built based on the basics of Deepseek-V3. Like a legitimate inference model, it starts with a powerful bass model, and Deepseek-V3 is exactly that. The mixture of this 671B expert (MOE) model works equally as heavyweight, such as Sonnet 3.5 and GPT-4O. I am particularly impressed that multi -token prediction (MTP), potential attention to multi -head (MLA), and many hardware optimization are grateful.

DeepSeek has also introduced two models, Deepseek-R1-ZERO and Deepseek-R1. Deepseek-R1-Zero completely depends on the enhanced learning (RL) to make the process more efficiently skip the monitored fine adjustments and use the group relative policy optimization (GRPO) to make the process more efficient. I was. Using a simple reward system, the model was derived, and feedback was provided based on the accuracy and structure of the answer. This approach helped develop useful inference skills, such as the model violating the problem or verifying its own output. However, the response often lacked clarity and was difficult to read.

From there, Deepseek-R1 appears. Started with the “Cold Start” phase, it was fine -tuned with a small set carefully created to improve clarity and readability. From there, after a more RL and an improvement step, we have created a model that refuses low -quality output with both rewards and verified rewards based on human taste, and creates a sophisticated and consistent answer. 。

Deepseek-V3 architecture

This sounds all great, but what is actually missing? Let’s look at the lack of puzzle pieces.

Open-R1: A missing piece

Release of Deepseek-R1 is a great benefit for the community, but not everything. The weight of the model is open, but the datasets and code used for model training are not 😢.

The goal of Open-R1 is that these endless parts can be built so that the entire research and industry communities can build similar models or better models using these recipes and datasets. That is. And by doing this open, everyone in the community can contribute!

As shown in the figure below, the attack plan is as follows.

Step 1: The R1-Distill model is reproduced by distilling high quality inference datasets from Deepseek-R1. Step 2: DeepSeek duplicates the pure RL pipeline used to create R1-Zero. This includes a new large dataset curation for mathematics, inference, and code. Step 3: It shows that you can move from the base model → SFT → RL via multi -stage training.

Open-R1 step
With a synthetic dataset, everyone can fine -tune the existing or new LLM and simply fine -tuned it into a reasoning model. Training recipes, including RL, function as a starting point for building similar models from zero, and researchers can build even more advanced methods.

Note that you do not want to stop in the mathematical dataset. There are many possibilities to explore other areas. This is not only an obvious area like a code, but also a scientific field such as medicine that can have a significant impact for inference models.

This initiative is not only to reproduce the results, but also to share the community and insights. By writing what works, what is not, what is not, and why, you want to save other people by wasting time and calculating with non -productive paths. I am.

If this sounds interesting, we love your help! There are many ways to involve the code contribution, whether or not you participate in the discussion about hugging your face. Let’s build together! Lingering

author avatar
See Full Bio
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleDeepSeek Rollout promotes the response of the White House to “secure AI rule”
Next Article AI and cloud vulnerabilities are not the only threats facing today’s CISO

Related Posts

Tools

Easy to train your model using H100 GPU on nvidia dgx cloud

June 30, 2025
Tools

Best Pytorch Quantization Backend

June 29, 2025
Tools

Intel Meteor Lake’s PHI-2

June 29, 2025
Add A Comment
Leave A Reply Cancel Reply

Top Posts

BitMart Research: MCP+AI Agent – A new framework for AI

May 13, 20251 Views

The UAE announces bold AI-led plans to revolutionize the law

April 22, 20251 Views

The UAE will use artificial intelligence to develop new laws

April 22, 20251 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

BitMart Research: MCP+AI Agent – A new framework for AI

May 13, 20251 Views

The UAE announces bold AI-led plans to revolutionize the law

April 22, 20251 Views

The UAE will use artificial intelligence to develop new laws

April 22, 20251 Views
Don't Miss

Lossless compression tailored to AI

June 30, 2025

Easy to train your model using H100 GPU on nvidia dgx cloud

June 30, 2025

Best Pytorch Quantization Backend

June 29, 2025
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2025 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?