Holotron-12B - High Throughput Computer Use Agent
read at source ↗ huggingface.co
Holotron-12B - High Throughput Computer Use Agent
Source: HuggingFace Date: 2026-03-17 URL: https://huggingface.co/blog/Hcompany/holotron-12b
Summary
Model release: H Company’s Holotron-12B, a computer-use agent model post-trained from Nemotron-Nano-2 VL using ~14B tokens of proprietary screen grounding and navigation data. Hybrid SSM + attention architecture yields 8.9k tokens/sec on a single H100 at 100 concurrent requests — 2x higher throughput than Holo2-8B (5.1k tokens/sec plateau). WebVoyager: 80.5% (up from 35.1% baseline).
Implications
Thread: open-weights ecosystem health / agentic patterns. The SSM-hybrid architecture choice for a computer-use agent is strategic: KV cache growth at long contexts (multi-screenshot sequences) is the primary inference bottleneck for GUI agents. The 2x throughput advantage over a pure-transformer model at high concurrency (100 concurrent requests) is the production-relevant metric — it means 2x server utilization efficiency for computer-use API endpoints. The WebVoyager 80.5% result makes Holotron-12B the strongest open-weight computer-use model on that benchmark. Watch whether the SSM-hybrid becomes the standard architecture for GUI agents specifically.