2025-10-02 · HuggingFace

SOTA OCR with Core ML and dots.ocr

modelsinfrastructure

SOTA OCR with Core ML and dots.ocr

Source: HuggingFace Date: 2025-10-02 URL: https://huggingface.co/blog/dots-ocr-ne

Summary

Integration tutorial (part 1 of 3) walking through the Core ML conversion of dots.ocr — a 3B OCR model from RedNote that surpasses Gemini 2.5 Pro on OmniDocBench. Vision encoder (1.2B, NaViT architecture) successfully converted; LM backbone (Qwen2.5-1.5B) handled by MLX. Current on-device status: >5GB, >1s per forward pass on GPU — parts 2-3 will address quantization and Neural Engine optimization. Apple Neural Engine claimed to be 12x more power-efficient than CPU for inference.

Implications

Thread: open-weights ecosystem health. This is part of a growing wave of “SOTA model → on-device Apple” conversion work. The dots.ocr result (beating Gemini 2.5 Pro on OCR) is striking — it suggests open-weight specialized models can now exceed frontier VLMs on specific tasks. The conversion work itself (handling dynamic shapes, boolean mask dtype conversions for Neural Engine compatibility) is well-documented debugging that will help others attempting similar conversions. Watch for parts 2-3 which will determine whether this is practically deployable or remains a research exercise.

← all signals