Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

Cadence expands AI and robotics partnership with Nvidia and Google Cloud

April 20, 2026

OpenAI Agents SDK improves governance with sandboxed execution

April 18, 2026

Gemini 3.1 Flash TTS: New Text-to-Speech AI Model

April 15, 2026
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Monday, April 20
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources
Versa AI hub
Home»Business»Chinese AI company says breakthrough allows creation of cutting-edge AI models using 11x less computing power — DeepSeek optimization could highlight limitations of US sanctions There is
Business

Chinese AI company says breakthrough allows creation of cutting-edge AI models using 11x less computing power — DeepSeek optimization could highlight limitations of US sanctions There is

By December 27, 2024No Comments4 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
Share
Facebook Twitter LinkedIn Pinterest Email

Chinese AI startup DeepSeek says it has trained AI models that are comparable to leading models from leading companies such as OpenAI, Meta, and Anthropic, but with an 11x reduction in the amount and cost of GPU computing. . While this claim has not yet been fully verified, this surprising announcement suggests that while US sanctions are impacting the availability of AI hardware in China, smart scientists are predicting a suffocating impact. This suggests that they are working to get the most performance out of a limited amount of hardware in order to alleviate this problem. China supplies AI chips. The company has open sourced its models and weights, so expect tests to be published soon.

According to the paper, Deepseek trained the DeepSeek-V3 Mixture-of-Experts (MoE) language model with 671 billion parameters in just two months using a cluster containing 2,048 Nvidia H800 GPUs. This means 2.8 million GPU hours. For comparison, Meta required 11 times more compute power (30.8 million GPU hours) to train Llama 3 with 405 billion parameters using a cluster with 16,384 H100 GPUs over 54 days .

DeepSeek uses advanced pipeline algorithms, an optimized communication framework, and FP8’s low-precision computation and communication to significantly reduce the compute and memory demands typically required for models of this scale. I am claiming.

The company uses a cluster of 2,048 Nvidia H800 GPUs, each equipped with an NVLink interconnect for GPU-to-GPU communication and an InfiniBand interconnect for node-to-node communication. In such a setup, GPU-to-GPU communication is fairly fast, but node-to-node communication is not, so optimization is the key to performance and efficiency. DeepSeek has implemented dozens of optimization techniques to reduce the computing requirements of DeepSeek-v3, and several key technologies enable its impressive results.

DeepSeek used the DualPipe algorithm to overlap computation and communication phases within and between forward and backward microbatches to reduce pipeline inefficiencies. In particular, dispatch (routing tokens to experts) and join (aggregating results) operations were processed in parallel with computation using customized PTX (parallel threaded execution) instructions. This means writing specialized low-level code to interface with Nvidia CUDA. Optimize your GPU and its behavior. According to DeepSeek, the DualPipe algorithm minimizes training bottlenecks, especially in the cross-node expert parallelism required by MoE architectures, and this optimization allows clusters to run with near-zero communication overhead during pre-training. It can now process 14.8 trillion tokens.

In addition to implementing DualPipe, DeepSeek limited each token to a maximum of four nodes to limit the number of nodes involved in communication. This reduced traffic and allowed communication and computation to overlap effectively.

A key element in reducing computing and communication requirements was the introduction of low-precision training methods. DeepSeek employs the FP8 mixed-precision framework to enable faster calculations and reduced memory usage without compromising numerical stability. Key operations such as matrix multiplication were performed in FP8, while sensitive components such as embedding and normalization layers were kept at higher precision (BF16 or FP32) to ensure accuracy. This approach consistently resulted in relative training loss errors of less than 0.25% and reduced memory requirements while maintaining robust accuracy.

Get the best Tom’s Hardware news and in-depth reviews delivered right to your inbox.

In terms of performance, the company says its DeepSeek-v3 MoE language model is on par with or better than GPT-4x, Claude-3.5-Sonnet, and LLlama-3.1, depending on the benchmark. Naturally, you have to make sure that third-party benchmarks prove that. The company has open sourced its models and weights, so expect tests to be published soon.

(Image credit: DeepSeek)

Although DeepSeek-V3 may be inferior to frontier models such as GPT-4o and o3 in terms of number of parameters and inference capabilities, DeepSeek’s achievements demonstrate that it can achieve advanced MoE using relatively limited resources. It shows that you can train a language model. Of course, this requires a lot of optimization and low-level programming, but the results seem surprisingly good.

The DeepSeek team recognizes that deploying the DeepSeek-V3 model requires advanced hardware and a deployment strategy that separates the prepopulation and decoding stages, but small businesses may not be able to do this due to lack of resources. It may not be possible.

“While we recognize the superior performance and cost-effectiveness of DeepSeek-V3, we also recognize that DeepSeek-V3 has several limitations, particularly with respect to deployment,” the company’s paper reads. It is written. “First, to ensure efficient inference, the recommended deployment unit for DeepSeek-V3 is relatively large, which can be burdensome for small teams.Second, DeepSeek-V3 ‘s deployment strategy has achieved an end-to-end generation, but fortunately these limitations are expected to be resolved naturally with the development of more advanced hardware.

author avatar
See Full Bio
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleAI Research Review 24.12.27 – Written by Patrick McGuinness
Next Article Meta expects AI characters to generate social media content

Related Posts

Business

Meta PMs embrace their role as AI builders and reshape the dynamics of the tech industry

February 19, 2026
Business

Salesforce research shows what employees think about the impact of AI on tasks and outcomes

February 19, 2026
Business

Alexander Wang, Chief AI Officer at Meta, Attends India AI Impact Summit

February 19, 2026
Add A Comment

Comments are closed.

Top Posts

How to save millions of online casinos with artificial intelligence -5 important ways

January 24, 20254 Views

‘Junk science’ fabricated by AI floods Google Scholar, researchers warn

January 13, 20254 Views

Agricultural drones are getting smarter for large farms

April 15, 20263 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

How to save millions of online casinos with artificial intelligence -5 important ways

January 24, 20254 Views

‘Junk science’ fabricated by AI floods Google Scholar, researchers warn

January 13, 20254 Views

Agricultural drones are getting smarter for large farms

April 15, 20263 Views
Don't Miss

Cadence expands AI and robotics partnership with Nvidia and Google Cloud

April 20, 2026

OpenAI Agents SDK improves governance with sandboxed execution

April 18, 2026

Gemini 3.1 Flash TTS: New Text-to-Speech AI Model

April 15, 2026
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2026 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?