XetHub is joining Hugging Face!
read at source ↗ huggingface.co
XetHub is joining Hugging Face!
Source: HuggingFace Date: 2024-08-08 URL: https://huggingface.co/blog/xethub-joins-hf
Summary
Hugging Face acquired XetHub, a Seattle-based company built by former Apple ML infrastructure engineers, to replace Git LFS as the Hub’s storage backend. The technical problem is concrete: the Hub has 1.3M models, 450K datasets, 12PB in LFS, and 1B daily requests—but LFS requires re-uploading entire large files when any byte changes. XetHub’s approach uses chunked storage with deduplication, so modifying a single metadata field in a 10GB model file uploads only the changed chunks rather than the full file. The stated goal is to support trillion-parameter models and improve versioning, experimentation, and dataset evolution tracking.
Implications
- Feeds the context management divergence thread: efficient large-file versioning is the storage-layer prerequisite for serious dataset and model iteration at scale. As model training pipelines shorten, the bottleneck shifts from compute to storage throughput—this is Hugging Face addressing that directly.
- Relevant to the agent layer / lifecycle phase thread: agents that read, write, and version datasets as part of training loops (rather than just consuming models) need exactly this kind of chunked, diff-aware storage. The acquisition positions Hugging Face as infrastructure for that loop, not just a model registry.
- Background for local inference work: better Hub storage means faster partial downloads and more practical version pinning for large local model files—relevant to any workflow managing GGUF quants across multiple machines.