As part of Hugging Face’s Xet team’s work to improve Hugging Face Hub’s storage backend, we analyzed 24 hours of Hugging Face upload requests to better understand access patterns. On October 11, 2024, there were 8.2 million upload requests and 130.8 TB of data transferred from 88 countries.
The map below visualizes this activity, with countries color-coded by bytes uploaded per hour.
Uploads are currently stored in an S3 bucket in us-east-1 and are optimized using S3 Transfer Acceleration. Downloads are cached and served using AWS Cloudfront as a CDN. Cloudfront’s 400+ convenient edge locations provide global coverage and low-latency data transfer. However, like most CDNs, it is optimized for web content and has a file size limit of 50 GB.
While this size limit is reasonable for typical Internet file transfers, it presents challenges as the file sizes in model and dataset repositories continue to grow. For example, meta-llama/Meta-Llama-3-70B has a total weight of 131 GB and is split into 30 files to meet the hub’s recommendation of chunking the weight into 20 GB segments. Additionally, enabling advanced deduplication or compression techniques on both uploads and downloads requires rethinking how file transfers are handled.
Custom protocols for upload and download
To push the Hugging Face infrastructure beyond its current limitations, we’re redesigning the hub’s upload and download architecture. We plan to insert a Content Address Store (CAS) as the first location for content delivery. This allows you to implement custom protocols built on the basic tenets of dumb reads and smart writes. Unlike Git LFS, which treats files as opaque blobs, our approach analyzes files at the byte level and reveals opportunities to improve transfer speeds for large files in model and dataset repositories.
The read path prioritizes simplicity and speed, ensuring high throughput with minimal latency. Requests for files are routed to the CAS server, which provides reconfiguration information. The data itself remains backed up by the S3 bucket in us-east-1, and AWS CloudFront continues to serve as the CDN for downloads.
The write path is more complex to optimize upload speed and provide additional security guarantees. Similar to reads, upload requests are routed to the CAS server, but instead of querying at the file level, you operate on chunks. Once a match is found, the CAS server instructs the client (for example, huggingface_hub) to transfer only the required (new) chunks. Chunks are validated by CAS before being uploaded to S3.
There are many implementation details to deal with, such as network constraints and storage overhead, which I will discuss in a future post. For now, let’s take a look at what the current reading is. The first diagram below shows the current read and write paths.

On the other hand, in the new design, reads follow the following path:

Finally, the updated write path is:

By managing files at the byte level, you can apply optimizations for different file formats. For example, we are looking at improving deduplication for Parquet files, and are currently looking at compressing tensor files (such as Safetensor), which could reduce upload speeds by 10-25%. As new formats emerge, we are uniquely positioned to develop further enhancements that improve the development experience in the hub.
This protocol also provides significant improvements for enterprise customers and power users. Inserting a control plane for file transfers ensures that no malicious or invalid data is uploaded. Operationally, uploads are no longer a black box. Enhanced telemetry provides an audit trail and detailed logging to help hub infrastructure teams identify and resolve issues quickly and efficiently.
Designed for global access
To support this custom protocol, you must determine the optimal geographic distribution of your CAS services. AWS Lambda@Edge was initially considered for its broad global coverage to minimize round-trip times. However, because it relies on Cloudfront triggers, it is no longer compatible with the updated upload path. Instead, we decided to deploy CAS nodes in a select few of AWS’s 34 Regions.
A closer look at the 24-hour window for S3 PUT requests identified global traffic patterns that reveal the distribution of data uploads to the hub. As expected, the majority of the activity comes from North America and Europe, with a continuous high volume of uploads throughout the day. The data also highlights a strong and growing presence in Asia. By focusing on these core regions, you can deploy CAS points of presence to balance storage and network resources while minimizing latency.
AWS offers 34 regions, and our goal is to keep infrastructure costs reasonable while maintaining a high user experience. Of the 88 countries shown in this snapshot, the Pareto chart above shows that the top 7 countries account for 80% of the bytes uploaded, and the top 20 countries account for 95% of the total uploads and requests. It shows that there is.
The US has emerged as a major source of upload traffic, necessitating PoPs in this region. In Europe, most activity is concentrated in midwestern countries (such as Luxembourg, the United Kingdom, and Germany), although there is some additional activity in Africa (particularly Algeria, Egypt, and South Africa). Upload traffic in Asia is primarily driven by Singapore, Hong Kong, Japan, and South Korea.
Using simple heuristics to distribute traffic, you can divide CAS coverage into three main areas:
us-east-1: Serving the Americas eu-west-3: Serving Europe, the Middle East, and Africa ap-southeast-1: Serving Asia and Oceania
This ends up being very effective. The US and Europe accounted for 78.4% of bytes uploaded, followed by Asia for 21.6%.
This regional breakdown provides a balanced load across the three CAS PoPs, with additional capacity for expansion in ap-southeast-1 and, if needed, in us-east-1 and eu-west-3. gives you the flexibility to scale up.
Based on expected traffic, we plan to allocate resources as follows:
us-east-1: 4 nodes eu-west-3: 4 nodes ap-southeast-1: 2 nodes
Verification and inspection
Increasing the first hop distance for some users has limited overall impact on the hub’s overall bandwidth. We estimate that the cumulative bandwidth for all uploads will decrease from 48.5 Mbps to 42.5 Mbps (a 12% reduction), but we expect the performance impact to be more than offset by other system optimizations. Masu.
We are currently working on moving our infrastructure into production by the end of 2024, starting with a single CAS in us-east-1. From there, we begin replicating the internal repository to the new storage system to benchmark transfer performance, and then replicate the CAS to the additional PoPs mentioned above for further benchmarking. Based on these results, we will continue to optimize our approach to ensure everything works smoothly when the storage backend is fully deployed next year.
Beyond part time job
As this analysis continues, new opportunities are emerging for deeper insights. Hugging Face hosts one of the largest collections of data from the open source machine learning community, providing a unique perspective to explore the techniques and trends driving AI development around the world.
For example, future analytics could categorize models uploaded to the hub by use case (NLP, computer vision, robotics, large-scale language models, etc.) and explore geographic trends in ML activity. This data not only informs infrastructure decisions, but also provides a lens into the evolving landscape of machine learning.
We invite you to learn more about our current research findings. Visit our interactive space to see Upload distributions in your area and follow the team to hear more about what we’re building.