2025-02-12 · fly.io

We're Cutting L40S Prices In Half

pricingenterpriseinfrastructure

We’re Cutting L40S Prices In Half

Source: fly.io Date: 2025-02-12 URL: https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half/

Summary

Pricing announcement cutting L40S GPU rates to $1.25/hour — matching A10 GPU prices — to drive adoption of a higher-performance inference option. The post explains the business reasoning: customers unexpectedly preferred the cheaper A10 for inference over high-end A100s, and the L40S (“the VW GTI of the lineup”) offers near-A100 compute performance at A10 cost. Inference economics, they argue, favor networking reliability over raw FLOPS.

Implications

Edge deployment economics / GPU market thread. This pricing move is the follow-on to “We Were Wrong About GPUs” — Fly is actively reshaping its GPU strategy around inference rather than training. The insight that inference needs reliable networking (it’s serving HTTP requests) while training tolerates batch-job latency is a genuine architectural distinction that justifies different hardware positioning. At $1.25/hour, L40S competes directly with Lambda Labs, Vast.ai, and Replicate for inference workloads. The broader signal: commodity GPU pricing is falling fast enough that the economic case for self-hosted inference is strengthening relative to OpenAI/Anthropic API costs.

← all signals