2026-05-27 · HuggingFace

Reachy Mini goes fully local

modelsinfrastructure

Reachy Mini goes fully local

Source: HuggingFace Date: 2026-05-27 URL: https://huggingface.co/blog/local-reachy-mini-conversation

Summary

Pollen Robotics’ Reachy Mini conversation app now runs its full voice pipeline locally — VAD (Silero), STT (Parakeet-TDT 0.6B), LLM (Gemma-4 or Qwen3-4B), and TTS (Qwen3-TTS) — with no cloud APIs required. The stack uses HuggingFace’s speech-to-speech library and can serve the LLM via llama.cpp, vLLM, MLX, or transformers, making it portable across laptop, consumer GPU, and Apple Silicon hardware. No audio leaves the local network.

Implications

Open-weight ecosystem: this is a concrete end-to-end embodied AI deployment using entirely open-weight models — the stack (Gemma-4, Qwen3, Parakeet) is fully substitutable, which means the robotics conversation layer is now a commodity integration problem rather than a model-sourcing problem.
Agent-fleet operability: local inference on embedded/consumer hardware removes the latency and uptime dependency on external APIs — critical for real-time robotic interaction where cloud round-trips are disqualifying.
Privacy-sensitive robotics applications (healthcare, home, industrial) now have a credible fully-local voice-agent stack; the barrier is no longer “does a capable model exist locally” but “does the hardware fit the form factor.”

← all signals