Introducing Training Cluster as a Service - a new collaboration with NVIDIA
read at source ↗ huggingface.co
Introducing Training Cluster as a Service - a new collaboration with NVIDIA
Source: HuggingFace Date: 2025-06-11 URL: https://huggingface.co/blog/nvidia-training-cluster
Summary
Partnership announcement: HuggingFace and NVIDIA are launching Training Cluster as a Service, allowing HF’s 250k+ organizations to request on-demand GPU clusters (NVIDIA Hopper and GB200) provisioned via NVIDIA DGX Cloud Lepton. Organizations pay only for training duration; HF and NVIDIA source and price based on size, region, and duration. Early users include TIGEM (rare disease research), Numina (math AI), and Mirror Physics (chemistry/materials science).
Implications
Open-weights ecosystem health. On-demand cluster access without long-term GPU contracts removes a significant barrier for research labs and startups that want to train frontier-scale models but can’t justify owned infrastructure. This is a direct challenge to the AWS/GCP/Azure model for ML compute procurement.
HF as open-source ML hub. HF is extending its role from weights distribution to training infrastructure — becoming the full stack from dataset hosting through cluster provisioning. Combined with Inference Endpoints, this positions HF as a vertically integrated ML platform rather than just a repository. The NVIDIA partnership is load-bearing: without GPU access at scale, the compute story doesn’t close.