2025-04-10 · fly.io

Taming A Voracious Rust Proxy

infrastructure

read at source ↗ fly.io

Taming A Voracious Rust Proxy

Source: fly.io Date: 2025-04-10 URL: https://fly.io/blog/taming-rust-proxy/

Summary

Incident postmortem and engineering writeup documenting a CPU-intensive busy-loop bug in fly-proxy traced to Rustls’s TLS stream state machine. When a TLS session closed with a CloseNotify alert but had buffered data remaining, Rustls mishandled its waker, causing futures to be polled in a tight loop. Triggered by load testing from Tigris (small bodies + early connection termination), discovered via flamegraph profiling of tracing span overhead as a proxy signal for spurious wakeups.

Implications

Edge deployment economics / infrastructure substrate. This is a signal about the operational reality of running a Rust-heavy proxy fleet at scale: async waker bugs are silent until they’re catastrophic, and flamegraphs of instrumentation overhead can surface them before the obvious symptoms appear. The commitment to add spurious-wakeup instrumentation is the right call — and reflects Fly’s broader investment in Corrosion/fly-proxy as foundational infrastructure. For teams building on Rust async (tokio, futures), this is a concrete cautionary pattern: TLS close-notify + buffered data = waker bug surface.

← all signals