2025-02-12 · fly.io

AI GPU Clusters, From Your Laptop, With Livebook

modelsinfrastructure

AI GPU Clusters, From Your Laptop, With Livebook

Source: fly.io Date: 2025-02-12 URL: https://fly.io/blog/ai-gpu-clusters-from-your-laptop-livebook/

Summary

Engineering writeup demonstrating elastic GPU cluster scaling for Elixir ML workloads using three components: Livebook (interactive notebook connected to a remote Elixir cluster), FLAME (serverless-style elastic scaling for Erlang nodes), and the Nx/Axon/Bumblebee stack (GPU-accelerated ML in Elixir). Concrete examples include video processing with Llama/Mistral and hyperparameter tuning BERT across 64 GPU machines on Fly.io. The framing: Erlang’s cluster primitives make this dramatically faster to build than equivalent Python infrastructure.

Implications

Edge deployment economics / GPU market thread. FLAME is an underappreciated piece of the GPU scaling story — it gives Elixir developers a serverless-style API for burst GPU compute that’s comparable to AWS Lambda + SageMaker but running on Fly’s GPU fleet. The 64-GPU BERT tuning example is more than a demo: it’s proof that Fly’s GPU infrastructure can handle genuine distributed ML workloads, not just single-model inference. For the radar, Livebook-as-distributed-ML-notebook is an interesting alternative to Jupyter + Ray/Dask, particularly for teams already in the BEAM ecosystem. The broader signal: Fly is becoming a credible platform for ML practitioners, not just LLM API consumers.

← all signals