Taming A Voracious Rust Proxy
read at source ↗ fly.io
Taming A Voracious Rust Proxy
Source: fly.io Date: 2025-04-10 URL: https://fly.io/blog/taming-rust-proxy/
Summary
Incident postmortem and engineering writeup documenting a CPU-intensive busy-loop bug in fly-proxy traced to Rustls’s TLS stream state machine. When a TLS session closed with a CloseNotify alert but had buffered data remaining, Rustls mishandled its waker, causing futures to be polled in a tight loop. Triggered by load testing from Tigris (small bodies + early connection termination), discovered via flamegraph profiling of tracing span overhead as a proxy signal for spurious wakeups.
Implications
Edge deployment economics / infrastructure substrate. This is a signal about the operational reality of running a Rust-heavy proxy fleet at scale: async waker bugs are silent until they’re catastrophic, and flamegraphs of instrumentation overhead can surface them before the obvious symptoms appear. The commitment to add spurious-wakeup instrumentation is the right call — and reflects Fly’s broader investment in Corrosion/fly-proxy as foundational infrastructure. For teams building on Rust async (tokio, futures), this is a concrete cautionary pattern: TLS close-notify + buffered data = waker bug surface.