2025-04-10 · fly.io

We Were Wrong About GPUs

infrastructurecommentary

read at source ↗ fly.io

We Were Wrong About GPUs

Source: fly.io Date: 2025-04-10 URL: https://fly.io/blog/wrong-about-gpu/

Summary

Essay/postmortem by Kurt Mackey on Fly.io’s GPU Machines bet. The core admission: they built GPU containers in virtualized VMs (months of complex Nvidia driver engineering) for a market that turned out to mostly want OpenAI/Anthropic API calls. “Developers don’t want GPUs. They don’t even want AI/ML models. They want LLMs.” The serious GPU compute market (H100 clusters) and the lightweight inference market (L40S) are both real but small relative to API-consumption usage patterns.

Implications

GPU market thread / edge deployment economics. This is one of the most useful signals in the batch — a candid account of being directionally correct (AI/ML matters) but tactically wrong (which form factor). It validates the LLM-API-as-commodity thesis: the market has spoken clearly that managed model APIs win over self-hosted GPU inference for the median developer. The residual L40S business is real but narrow. For Ellis’s radar, this shapes the “local model hosting” thread: serious self-hosting is for specific workloads (privacy, cost at scale, abliterated variants), not general purpose. The mainstream goes through APIs.

← all signals