2024-08-08 · HuggingFace

XetHub is joining Hugging Face!

ecosystem

XetHub is joining Hugging Face!

Source: HuggingFace Date: 2024-08-08 URL: https://huggingface.co/blog/xethub-joins-hf

Summary

Hugging Face acquired XetHub, a Seattle-based company built by former Apple ML infrastructure engineers, to replace Git LFS as the Hub’s storage backend. The technical problem is concrete: the Hub has 1.3M models, 450K datasets, 12PB in LFS, and 1B daily requests—but LFS requires re-uploading entire large files when any byte changes. XetHub’s approach uses chunked storage with deduplication, so modifying a single metadata field in a 10GB model file uploads only the changed chunks rather than the full file. The stated goal is to support trillion-parameter models and improve versioning, experimentation, and dataset evolution tracking.

Implications

Feeds the context management divergence thread: efficient large-file versioning is the storage-layer prerequisite for serious dataset and model iteration at scale. As model training pipelines shorten, the bottleneck shifts from compute to storage throughput—this is Hugging Face addressing that directly.
Relevant to the agent layer / lifecycle phase thread: agents that read, write, and version datasets as part of training loops (rather than just consuming models) need exactly this kind of chunked, diff-aware storage. The acquisition positions Hugging Face as infrastructure for that loop, not just a model registry.
Background for local inference work: better Hub storage means faster partial downloads and more practical version pinning for large local model files—relevant to any workflow managing GGUF quants across multiple machines.

← all signals