2024-07-09 · HuggingFace

Google Cloud TPUs made available to Hugging Face users

modelsinfrastructure

Google Cloud TPUs made available to Hugging Face users

Source: HuggingFace Date: 2024-07-09 URL: https://huggingface.co/blog/tpu-inference-endpoints-spaces

Summary

Integration announcement (since suspended): Google Cloud TPUs made available on HF Inference Endpoints and Spaces via three TPU v5litepod configurations ($1.375/hr for 1-core/16GB to $11/hr for 8-core/128GB). Backed by Optimum TPU, an open-source library for serving HF models on Google TPUs via TGI. Initial model support: Gemma, Llama, Mistral families. As of mid-2025, the TPU feature has been suspended on Inference Endpoints.

Implications

HF as open-source ML hub. TPU inference on HF launching and then being suspended suggests the complexity of managing TPU infrastructure at a consumer/developer level exceeded what HF could sustain. TPUs remain highly optimized for specific workloads but require more operational expertise than GPU-based inference — the suspension reflects the maturity gap.

Open-weights ecosystem health. Optimum TPU as a standalone library continues to exist even after the Inference Endpoints feature suspension — teams with GCP TPU access can still use the tooling for self-managed serving. The HF × Google Cloud partnership on the framework level (Optimum TPU, JAX/Keras support) is more durable than the managed service layer.

← all signals