2026-01-20 · HuggingFace

Introducing Waypoint-1: Real-time interactive video diffusion from Overworld

modelsresearchinfrastructure

Introducing Waypoint-1: Real-time interactive video diffusion from Overworld

Source: HuggingFace Date: 2026-01-20 URL: https://huggingface.co/blog/waypoint-1

Summary

Model release from Overworld: Waypoint-1-Small (2.3B parameters), a frame-causal rectified flow transformer trained on 10,000 hours of video game footage for real-time interactive video generation. Controllable via text, mouse, and keyboard with claimed zero latency. On RTX 5090: ~30 FPS at 4 denoising steps. Accompanied by WorldEngine, a Python inference library for streaming interactive world models. Training uses diffusion forcing (pretraining) + self-forcing/DMD (post-training) to reduce error accumulation.

Implications

Thread: open-weights ecosystem health / model release cadence. Interactive video diffusion at real-time frame rates on consumer hardware is a meaningful capability threshold — this isn’t a research demo, it’s a released inference library. The game-world framing is obvious, but the underlying tech (real-time controllable video generation) has immediate implications for simulation, training data synthesis, and robotics world models. Watch Waypoint-1-Medium release and whether the RTX 5090 requirement comes down to more accessible hardware in future iterations.

← all signals