Architecture choices in China’s open source AI ecosystem: Building beyond DeepSeek

Avatar of Irene Solaiman

This is the second blog in a three-part series on the historical progress of China’s open source community since the “DeepSeek Moment” in January 2025. You can read the first blog here.

In this second article, we focus on the architecture and hardware choices Chinese companies made as openness became the norm.

For AI researchers and developers, contributing to and relying on open source ecosystems, and for policymakers who understand the rapidly changing environment, architectural preferences, modality diversification, licensing permissiveness, popularity of small-scale models, and increasing adoption of Chinese-made hardware, it presents a leadership strategy across multiple paths. DeepSeek R1’s unique characteristics have led to duplication and competition, contributing to a greater focus on Chinese domestic hardware.

Mix of Experts (MoE) as the default selection

Over the past year, the Chinese community’s leading models have almost unanimously moved to Mixture-of-Experts (MoE) architectures such as the Kimi K2, MiniMax M2, and Qwen3. Although R1 itself was not a MoE model, it proved important that strong inference is open, reproducible, and practical to design. Under China’s real-world constraints, MoE emerged as a natural solution to maintain high capacity, control costs, and enable models to be trained, deployed, and widely adopted.

MoE is like a controllable distributed computing system. Under a single functional framework, computing resources are allocated across requests and deployment environments by dynamically activating a varying number of experts depending on the complexity and value of the task. More importantly, it does not require all inference to consume the complete set of resources, nor does it assume that all deployment environments share identical hardware conditions.

The overall direction of China’s open source model in 2025 was clear. This means not necessarily delivering the strongest possible performance, but the ability to operate sustainably, deploy flexibly, and continually evolve to achieve the best cost-performance balance.

Rushing towards supremacy through modality

Starting in February 2025, open source activities will no longer focus solely on text models. It was soon extended in a multimodal and agent-based direction. Any-to-any models, text-to-image, image-to-video, text-to-video, TTS, 3D, and agents have all advanced in parallel. The community promoted not just model weights, but a complete set of engineering assets such as inference deployment, datasets and evaluations, toolchains, workflows, and edge-to-cloud coordination. The simultaneous emergence of video generation tools, 3D components, distilled datasets, and agent frameworks signaled something bigger than individual breakthroughs and demonstrated reusable system-level capabilities.

Competition for leadership in non-text modalities has intensified, as has DeepSeek. StepFun has released a high-performance multimodal model that excels in generating and processing or editing audio, video, and images. The latest speech synthesis model, Step-Audio-R1.1, boasts state-of-the-art performance that exceeds our own models. Tencent also reflected this change through its open source efforts in video and 3D. Projects such as the company’s Hunyuan Video model and Hunyuan 3D reflect increased competition beyond text-centric models.

Big priority for small models

Models ranging from 0.5B to 30B were easy to run locally, fine-tune, and integrate into business systems and agent workflows. Example: Qwen 1.5-0.5B has the most derivative models in the Qwen series. In environments with limited computing requirements or strict compliance requirements, these models are much better suited for long-term operation. At the same time, large companies often used large MoE models in the 100B to 700B range as caps or “teacher models” for their capabilities, and then distilled those capabilities into many smaller models. This created a clear structure with some very large models on top and many working models below. The increase in the proportion of small models in the monthly summary reflects actual usage needs in the community.

https://huggingface.co/spaces/cfahlgren1/hub-model-tree-stats

More permissive open source license

Since R1, Apache 2.0 has become close to the default choice for the Chinese community’s open model. More permissive licensing reduces the friction of using, modifying, and deploying models in production, making it much easier for companies to move open models into real-world systems. Familiarity with standard licenses such as Apache 2.0 and MIT will make it easier to use as well. Prescription and customized licenses create friction due to unfamiliarity and new legal barriers, contributing to the decline seen in the graph below.

Based on releases from all organizations shown in China Open Source Heatmap

From model first to hardware first

In 2025, model releases will increasingly integrate with inference frameworks, quantization formats, serving engines, and edge runtimes. The key goal was no longer just to make the weights downloadable, but to be able to run the model directly on the target domestic hardware, and to do so reliably and efficiently. This change was most pronounced on the reasoning side. For example, with DeepSeek-V3.2-Exp, both Huawei Ascend and Cambricon chips now have day-zero support, not as a cloud demo, but as reproducible inference pipelines released with weights, allowing developers to directly verify real-world performance.

At the same time, signals on the training side began to emerge. Ant Group’s Ling open model uses optimized training on domestically produced AI chips to achieve performance close to NVIDIA H800, reducing the training cost of 1 trillion tokens by about 20%. Baidu’s open Qianfan-VL model clearly documents that the model was trained on a cluster of over 5,000 Baidu Kunlun P800 accelerators, the company’s flagship AI chip, along with parallelism and efficiency details. In early 2026, it was announced that Zhipu’s GLM-Image and China Telecom’s latest open model TeleChat3 are both fully trained on domestic chips. These disclosures showed that home computers are no longer limited to inference and are beginning to enter critical stages of the training pipeline.

On the services and infrastructure side, engineering capabilities are being systematically open sourced. Moonshot AI released the service system Mooncake with explicit support for features such as prefill/decode separation. These efforts have significantly improved the baseline for deployment and operations across the community by open sourcing production-grade experiences, making it easier to run models reliably at scale. This direction was reflected throughout the ecosystem. Baidu’s FastDeploy 2.0 emphasized extreme quantization and cluster-level optimization to reduce inference costs under tight computing budgets. Alibaba’s Qwen ecosystem sought to tightly coordinate full-stack integration, models, inference frameworks, quantization strategies, and cloud adoption workflows to minimize friction from development to production. Still, reports of computing constraints in China threaten its expansion. Zhipu AI is reportedly limiting usage amid computing demand.

When models, tools, and engineering are delivered together, the ecosystem begins to evolve on its own, growing not by adding projects but by structurally differentiating on a shared foundation. As NVIDIA sells the H200, how China will respond to US hardware sales and export regulations remains an open question. Learn more about the changing global computing environment.

Under reconstruction

The “Deep Seek Moment” of January 2025 didn’t just spark a wave of new open models. With open source no longer optional but fundamental, it has forced a deeper rethink about how AI systems should be built and why those fundamental choices are of strategic importance.

Chinese companies are no longer optimizing for a siled model. Instead, they pursue a clear architectural path aimed at building a complete ecosystem suitable for the open source world. As models become increasingly commoditized, these decisions mark a clear shift in competition from model performance to system design.

In our next blog, we’ll dig deeper into the organization’s wins and share some of what to expect in 2026.

versatileai

See Full Bio

What's Hot

3.6 Flash, 3.5 Flash Lite, and 3.5 Flash Cyber

Google’s Gemini 3.6 Flash targets enterprise agent token costs

Introducing Cosmos 3 Edge

3.6 Flash, 3.5 Flash Lite, and 3.5 Flash Cyber

Google’s Gemini 3.6 Flash targets enterprise agent token costs

Introducing Cosmos 3 Edge

Trends and insights with new multilingual and long-form tracks

How AlphaChip revolutionized computer chip design

Tweak video and image models at scale with NVIDIA NeMo Automodel and 🤗 Diffuser

Most Popular

Trends and insights with new multilingual and long-form tracks

How AlphaChip revolutionized computer chip design

Tweak video and image models at scale with NVIDIA NeMo Automodel and 🤗 Diffuser

Don't Miss

3.6 Flash, 3.5 Flash Lite, and 3.5 Flash Cyber

Google’s Gemini 3.6 Flash targets enterprise agent token costs

Introducing Cosmos 3 Edge

Subscribe to Updates

What's Hot

Architecture choices in China’s open source AI ecosystem: Building beyond DeepSeek

Mix of Experts (MoE) as the default selection

Rushing towards supremacy through modality

Big priority for small models

More permissive open source license

From model first to hardware first

Under reconstruction

Related Posts