Granite 4.0 3B Vision: Compact Multimodal Intelligence for Enterprise Documents
read at source ↗ huggingface.co
Granite 4.0 3B Vision: Compact Multimodal Intelligence for Enterprise Documents
Source: HuggingFace Date: 2026-03-31 URL: https://huggingface.co/blog/ibm-granite/granite-4-vision
Summary
Model release: IBM Granite 4.0 3B Vision — a LoRA adapter on Granite 4.0 Micro targeting enterprise document understanding (table extraction, chart-to-structured-format, key-value pair extraction from forms). Architecture innovation: DeepStack Injection routes abstract features to early layers and high-resolution spatial features to later layers. Trained with ChartNet (1.7M synthetic chart samples across 24 types). Benchmarks: PubTablesV2 full-page 79.3 TEDS (best), Chart2Summary 86.4% (best), VAREX KVP 85.5% EM zero-shot. Apache 2.0 licensed. Integrates with Docling.
Implications
Open-weights ecosystem health. A 3B model achieving best-in-class table and chart extraction via a LoRA adapter on a 3B base is a deployment-architecture win — teams can run Granite 4.0 Micro for text workloads and activate vision capability via the adapter without a separate model. This modular pattern is more practical for enterprise document processing pipelines than running a separate 7B+ VLM for each task.
Model release cadence. Chart2CSV at 62.1% (second only to Qwen3.5-9B at 63.4% with 2x the parameters) at 3B scale confirms that domain-specific training data (1.7M ChartNet samples) compensates for scale disadvantage in structured-output extraction tasks. IBM’s enterprise document focus gives Granite a clear differentiation from general-purpose VLMs — the enterprise document benchmark suite itself is a contribution to the field.