2026-04-01 · HuggingFace

Falcon Perception

protocolsmodelsresearch

Falcon Perception

Source: HuggingFace Date: 2026-04-01 URL: https://huggingface.co/blog/tiiuae/falcon-perception

Summary

Model release from TII: Falcon Perception (0.6B) and Falcon OCR (0.3B), both open-vocabulary vision models using a single early-fusion Transformer backbone rather than the standard vision encoder + text decoder pipeline. On the SA-Co benchmark, Falcon Perception beats SAM 3 on overall Macro-F1 (68.0 vs 62.3) with particularly strong gains on spatial, relational, and dense-scene tasks. The OCR variant hits 88.64% on OmniDocBench at 0.3B parameters.

Implications

Thread: open-weights ecosystem health / model release cadence. TII continues to push Falcon as a multi-capability family, now entering perception/grounding territory. The architectural choice — early fusion over separate encoder+decoder — is notable: it’s the same direction MLLM research is trending but at 0.6B, which is unusually small for grounding tasks. If the PBench benchmark gains traction as a diagnostic alternative to opaque aggregate scores, watch whether other labs adopt it. The 54M training image scale and hard-negative curriculum are worth watching for dataset practices.

← all signals