Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

IPO-bound AI company Fractal Bags $170 million at valuation of $2.44 billion

July 16, 2025

Migrate hubs from Git LFS to Xet

July 16, 2025

Business owners are seeking approval for a new hookah lounge and beer service in Massachusetts

July 15, 2025
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Wednesday, July 16
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
Versa AI hub
Home»Tools»Migrate hubs from Git LFS to Xet
Tools

Migrate hubs from Git LFS to Xet

versatileaiBy versatileaiJuly 16, 2025No Comments7 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
#image_title
Share
Facebook Twitter LinkedIn Pinterest Email

In January this year, the Xet team at Hugging Face deployed a new storage backend, shifting 6% of hub downloads from infrastructure. This represents an important milestone, but it was just the beginning. In six months, 500,000 repositories holding 20 pbs joined the migration to Xet, with the hub moving above Git LFS and moving to storage systems that scale with AI builder workloads.

Today, more than 1 million people at Hub use Xet. In May, it became the default for new users and organization hubs. If you have just a few dozen Github issues, forum threads, and mismatch messages, this is probably the quietest migration of this magnitude.

how? One is the team prepared with many years of experience in building and supporting content addressed stores (CAS) and Rust clients that provide the foundation for the team. Without these pieces, Git LFS could still be the future of the hub. However, the nameless heroes of this transition are:

An integral part of the infrastructure known internally as a background content migration for Git LFS Bridges

Together, these components allowed us to actively migrate PBS during the day without worrying about impact on hubs and communities. They are giving us a mind that moves even faster in the coming weeks and months (skip to the end to see what’s coming).

Bridge and backward compatibility

In the early days of planning our transition to Xet, we made some important design decisions.

There is no “hard cutover” from git lfs to xet xet A xet-enabled repository cannot include a repository repository from XET to XET. This means you can run it in the background without confusing downloads or uploads

These seemingly outspoken decisions driven by our commitment to the community were important. Most importantly, we didn’t think that users and teams would need to change their workflows immediately or download new clients to interact with XET-enabled repository.

If you have an XET AWARE client (HF-XET, XET integration with HuggingFace_Hub), upload and download the entire XET stack. The client either splits the file into chunks using the content that defines the chunking defined during the upload, or requests file reconfiguration information when downloaded. Uploading will pass the chunks to the CAS and save them in S3. During download, CAS provides the chunk range that the client needs to request from S3 and rebuilds the file locally.

For older versions of Huggingface_hub or Huggingface.js that do not support chunk-based file transfer, you can download and upload it to Xet Repos, but these bytes take a different route. When Xet aid files are requested from a hub along the Resolve Endpoint, the Git LFS bridge will mimic the LFS protocol and construct and return a single preprocessing URL. The bridge then works to rebuild the file from the content held in S3 and returns it to the requester.

A very simplified view of GIT LFS bridges – in fact, this path includes several more API calls and components, such as a bridge facing CDN, DynamoDB for file metadata, and S3 itself.

To actually see this, right-click on the image above and open it in a new tab. The URL will be redirected from https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/migrating-hub-to-xet/bridge.png. A URL to check for redirection to the device.

Meanwhile, when a non-Xet-enabled client uploads a file it was first sent to LFS storage and migrated to Xet. This “background migration process” is simply mentioned in the documentation and enhances both the migration to XET and the upload of backward compatibility. This is behind a migration far beyond the numerous PBSs of models and datasets, keeping 500,000 repos in sync with XET storage without missing beats.

Whenever a file needs to be migrated from LFS to XET, a webhook is triggered and the event is pushed into a distributed queue where it is processed by the orchestrator. Orchestrator:

If the event requests, enable Xet in the repository. When you get a list of LFS revisions for all LFS files in the repo van, the file batches the files into a job based on size or number of files. You may also place the job first in another queue in the Migration Worker Pod, either in 1000 files or 500MB.

These migrant workers pick up jobs and each pod.

When you download the LFS file listed in the batch, it uploads the LFS file to the XET content store using Xet-Core

Migration Flow
Migration flow triggered by a webhook event. It starts with an orchestrator to keep it simple.

Scaling transition

In April, I tested the limitations of this system by contacting Bartowski and asking if I wanted to test Xet. The Bartowski migration revealed some weak links, close to 500 TB to 2,000 repositories.

The global DEDUPE temporary shard file was first written to /TMP and then moved to the shard cache. However, in the worker pod, the /TMP and XET caches were sitting at different mount points. The move failed and the shard file was not deleted. The disk was eventually filled and triggered a wave with no space left in the device error. After supporting the launch of the Llama 4, they scaled the CAS for Bursty downloads, but migrant workers flipped the script as hundreds of multi-gigabyte uploads pushed the CAS beyond paper resources. Pod profiling revealed networks and EBS I/O bottlenecks

Correcting these three monsters means touching all layers. Patch Xet-Core, resize CAS, and enhance worker node specifications. Luckily, Bartowski was a game that worked with us, but all the repositories headed for Xet. These same lessons have driven the movement of the biggest storage users on hubs, such as Richarderkhov (1.7pb and 25,000 repositories) and Mradermacher (6.1pb and 42,000 repositories).

Meanwhile, CAS throughput has grown by several orders of magnitude between the first and most recent large-scale transitions.

Bartowski Migration: CAS maintained ~35 Gb/s and ~5 Gb/s arrived from normal hub traffic. Migration of Mradermacher and Richarderkhov: CAS peaks at around 300 Gb/s, yet still weighs daily loads at 40 GB/s.

CAS Throughput
CAS throughput; each spike corresponds to a significant move in which baseline throughput steadily increases to 100 gb/s as of July 2025

Zero friction, faster transfer

When I started swapping my LFS, I had two goals in mind.

Do not harm

By designing using our initial constraints and these goals, we have:

Before you include it in huggingface_hub, you will learn how to deploy and deploy HF -xet to handle downloads and downloads from Xet-enabled repositories, from scaling to how clients work in client distribution file systems, through any means used by the community for necessary dependency support, from how the client works, from Hub to Xet, as the infrastructure handles rest.

Instead of waiting for all upload paths to become Xet-Aware, instead of forcing a hard cutover or pushing the community to adopt a specific workflow, you can start migrating your hub to XET immediately with minimal user impact. In short, teams are organically migrating to Xet with infrastructure that maintains workflows and supports the long-term goals of a unified storage system.

xet for everyone

In January and February, we onboarded power users to provide feedback and pressure test their infrastructure. To get community feedback, I launched a waitlist that previews Xet-enabled repository. Soon after that, Xet became the default for new users in the hub.

Currently, we support some of the biggest creators at the hub (Meta Llama, Google, Openai, Qwen), but our community continues to work uninterruptedly.

What’s next?

Starting this month, we’ve been bringing Xet to everyone. Beware of emails that provide access to Xet, and then when you get it, update to the latest HuggingFace_Hub (PIP Install -U Huggingface_Hub) to immediately unlock faster forwarding. This also means:

All existing repositories migrate from LFS to Xet All newly created repositories are Xet-enabled by default

It’s fine if you use a browser to upload or download from a hub or use Git. Both chunk-based support will soon be here. In the meantime, use the workflow you already have. There are no restrictions.

Next, open source the entire XET protocol and infrastructure stack. The future of storing and moving bytes scaled to AI workloads is on the hub, and we aim to bring it to everyone.

If you have any questions, please see one line in the comments. Open the discussion on the XET Team page.

author avatar
versatileai
See Full Bio
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleBusiness owners are seeking approval for a new hookah lounge and beer service in Massachusetts
Next Article IPO-bound AI company Fractal Bags $170 million at valuation of $2.44 billion
versatileai

Related Posts

Tools

Military AI contract awarded to humanity, Openai, Google and Xai

July 15, 2025
Tools

Reachy Mini – Open Source Robot for Today and Tomorrow’s AI Builders

July 13, 2025
Tools

AI is rewriting the rules of the insurance industry

July 12, 2025
Add A Comment

Comments are closed.

Top Posts

Data and AI Status: Security and Privacy

July 12, 20251 Views

Presight plans to expand its AI business internationally

April 14, 20251 Views

PlanetScale Vectors GA: MySQL and AI Database Game Changer

April 14, 20251 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

Data and AI Status: Security and Privacy

July 12, 20251 Views

Presight plans to expand its AI business internationally

April 14, 20251 Views

PlanetScale Vectors GA: MySQL and AI Database Game Changer

April 14, 20251 Views
Don't Miss

IPO-bound AI company Fractal Bags $170 million at valuation of $2.44 billion

July 16, 2025

Migrate hubs from Git LFS to Xet

July 16, 2025

Business owners are seeking approval for a new hookah lounge and beer service in Massachusetts

July 15, 2025
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2025 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?