Reachy Mini goes fully local
modelsinfrastructure
read at source ↗ huggingface.co
Reachy Mini goes fully local
Source: HuggingFace Date: 2026-05-27 URL: https://huggingface.co/blog/local-reachy-mini-conversation
Summary
Pollen Robotics’ Reachy Mini conversation app now runs its full voice pipeline locally — VAD (Silero), STT (Parakeet-TDT 0.6B), LLM (Gemma-4 or Qwen3-4B), and TTS (Qwen3-TTS) — with no cloud APIs required. The stack uses HuggingFace’s speech-to-speech library and can serve the LLM via llama.cpp, vLLM, MLX, or transformers, making it portable across laptop, consumer GPU, and Apple Silicon hardware. No audio leaves the local network.
Implications
- Open-weight ecosystem: this is a concrete end-to-end embodied AI deployment using entirely open-weight models — the stack (Gemma-4, Qwen3, Parakeet) is fully substitutable, which means the robotics conversation layer is now a commodity integration problem rather than a model-sourcing problem.
- Agent-fleet operability: local inference on embedded/consumer hardware removes the latency and uptime dependency on external APIs — critical for real-time robotic interaction where cloud round-trips are disqualifying.
- Privacy-sensitive robotics applications (healthcare, home, industrial) now have a credible fully-local voice-agent stack; the barrier is no longer “does a capable model exist locally” but “does the hardware fit the form factor.”