Our eighth generation TPUs: two chips for the agentic era
read at source ↗ blog.google
Our eighth generation TPUs: two chips for the agentic era
Source: Google Date: 2026-04-22 URL: https://blog.google/innovation-and-ai/infrastructure-and-cloud/google-cloud/eighth-generation-tpu-agentic-era/
Summary
Google announced two eighth-generation TPU chips at Cloud Next ‘26: the TPU 8t for training, delivering nearly 3x compute performance per pod over the prior generation and scaling to 9,600 chips with two petabytes of shared memory; and the TPU 8i for inference, with 288 GB of high-bandwidth memory and 80% better performance-per-dollar than its predecessor, optimized for low-latency agentic workloads. Both chips were designed in partnership with Google DeepMind and will become generally available later in 2026.
Implications
The bifurcation is the signal. Splitting training and inference into purpose-built chips is Google acknowledging that agentic AI runs on fundamentally different workload profiles than model training — high-concurrency, low-latency, many simultaneous short requests rather than long batched runs. TPU 8i is architected for agent serving, not benchmark runs.
The 80% performance-per-dollar claim on TPU 8i positions Google’s inference infrastructure against Nvidia H100/H200 clusters that most cloud competitors are using. If that claim holds at scale, it’s a meaningful cost moat — Google can serve Gemini agents cheaper than competitors serving on commodity GPU clusters, which translates directly to pricing power on the enterprise tier.
Three threads:
- Infrastructure moat: Custom silicon purpose-built for inference creates a cost floor OpenAI and Anthropic (both on third-party compute) can’t match without similar partnerships.
- Cloud agent platform: TPU 8i GA later in 2026 sets the infrastructure schedule for Gemini Enterprise Agent Platform scaling.
- Competitive pressure on Nvidia: Every exaflop of TPU inference capacity is a workload that doesn’t run on Nvidia hardware in Google Cloud.
Watch:
- GA date and pricing tier for TPU 8i access in Google Cloud
- Whether AWS/Trainium and Azure/Maia publish competing inference-per-dollar benchmarks
- Google DeepMind’s use of 8t for Gemini 4 training — this sets the next model generation’s compute foundation