Welcome the NVIDIA Llama Nemotron Nano VLM to Hugging Face Hub
read at source ↗ huggingface.co
Welcome the NVIDIA Llama Nemotron Nano VLM to Hugging Face Hub
Source: HuggingFace Date: 2025-06-27 URL: https://huggingface.co/blog/nvidia/llama-nemotron-nano-vl
Summary
Model release: NVIDIA Llama-3.1-Nemotron-Nano-VL-8B-V1, an 8B VLM built on Llama-3.1-8B-Instruct + C-RADIOv2-VLM-H vision encoder, optimized for document OCR and intelligent document processing. Outputs in HTML, LaTeX, and Markdown. Supports bounding box grounding (normalized 0-1000 coordinates). Leads OCRBench v2 among comparable VLMs. Deployable via HF Hub or NVIDIA NIM API; fine-tunable via NeMo.
Implications
Thread: open-weights ecosystem health / model release cadence. NVIDIA is building a specialized document-intelligence VLM rather than competing in general-purpose VLM benchmarks — a deliberate verticalization of the Nemotron family. OCR + bounding box grounding + multi-format output is a complete document processing stack in one model. For enterprises doing invoice/contract/healthcare form processing, this is a viable local alternative to cloud OCR APIs (Azure Form Recognizer, AWS Textract). The NIM API deployment path alongside HF Hub is NVIDIA’s standard dual-track: open weights for research, managed inference for enterprise.