2025-06-03 · HuggingFace

Holo1: New family of GUI automation VLMs powering GUI agent Surfer-H

agents

read at source ↗ huggingface.co

Holo1: New family of GUI automation VLMs powering GUI agent Surfer-H

Source: HuggingFace Date: 2025-06-03 URL: https://huggingface.co/blog/Hcompany/holo1

Summary

H Company released Holo1, a family of open-source action VLMs (3B and 7B) purpose-built for GUI automation and UI element localisation. Holo1-7B achieves 76.2% average accuracy on UI localisation benchmarks — highest for its size class — by identifying clickable elements and returning pixel coordinates rather than generating text descriptions. Surfer-H, H Company’s web agent, uses Holo1 as its localiser component in a three-model architecture (planner, localiser, validator), reaching 92.2% accuracy on real-world web tasks at $0.13 per task, with no custom APIs or fragile DOM wrappers required.

Implications

  • Feeds the GUI agent infrastructure thread — a small, open, high-accuracy localiser model that plugs into a modular agent architecture is exactly the missing primitive that makes browser automation reliable enough to deploy. The modular planner/localiser/validator split is worth watching as an architectural pattern.
  • The open release (Hugging Face Transformers compatible) and accompanying WebClick benchmark lower the barrier for building comparable systems without depending on proprietary vision APIs.
  • $0.13 per task on real-world web automation is a pricing signal worth tracking: if that holds at scale, it undercuts the cost argument for human-in-the-loop browser task execution.

← all signals