Bringing Robotics AI to Embedded Platforms: Dataset Recording, VLA Fine‑Tuning, and On‑Device Optimizations
read at source ↗ huggingface.co
Bringing Robotics AI to Embedded Platforms: Dataset Recording, VLA Fine‑Tuning, and On‑Device Optimizations
Source: HuggingFace Date: 2026-03-05 URL: https://huggingface.co/blog/nxp/bringing-robotics-ai-to-embedded-platforms
Summary
Integration tutorial from NXP on deploying VLA (Vision-Language-Action) models on the i.MX 95 embedded SoC. Covers dataset recording best practices (20% recovery episodes, gripper-mounted cameras, workspace partitioning), ACT/SmolVLA fine-tuning, and embedded deployment. Key benchmark: ACT at ONNX FP32 achieves 96% global accuracy at 2.86s inference latency; optimized ACT hits 0.32s but drops to 89% accuracy. SmolVLA at 29.1s latency achieves only 47% accuracy on-device.
Implications
Thread: open-weights ecosystem health / robotics ecosystem. This is real embedded robotics deployment, not lab demos — NXP i.MX 95 is a shipping SoC for industrial/consumer robots. The 0.32s inference vs 2.86s tradeoff (accepting 7% accuracy drop for 9x speed) is the kind of engineering decision that defines whether on-device VLA is practical for real manipulation tasks. ACT massively outperforming SmolVLA at 47% vs 89% suggests SmolVLA is not yet ready for embedded deployment. Watch for NXP toolchain support for more VLA architectures as this space matures.