2026-05-18 · HuggingFace

PaddleOCR 3.5: Running OCR and Document Parsing Tasks with a Transformers Backend

researchinfrastructure

read at source ↗ huggingface.co

PaddleOCR 3.5: Running OCR and Document Parsing Tasks with a Transformers Backend

Source: HuggingFace Date: 2026-05-18 URL: https://huggingface.co/blog/PaddlePaddle/paddleocr-transformers

Summary

PaddleOCR 3.5 adds a HuggingFace Transformers inference backend alongside its native Paddle runtime, letting teams run PP-OCRv5 and PaddleOCR-VL 1.5 document parsing models within PyTorch/Transformers infrastructure without switching frameworks. The backend is configurable down to dtype and attention implementation, supports GPU acceleration via bfloat16 and SDPA, and integrates cleanly into RAG and document AI pipelines. The default Paddle static backend remains the throughput-optimized choice; the Transformers backend trades some throughput for ecosystem compatibility.

Implications

  • Open-weight ecosystem. PaddleOCR is the dominant open-source OCR library by deployment volume, particularly in Asian markets. Adding Transformers backend support means the document parsing layer of RAG and document AI stacks can now be fully HuggingFace-native — removing a significant integration friction point for teams building on the Transformers ecosystem.
  • Agent-layer convergence. Document parsing is a critical preprocessing step for document-grounded agents. PaddleOCR 3.5 lowers the cost of plugging structured document extraction into agent pipelines — expect it to appear as a standard component in RAG-over-documents and enterprise document intelligence stacks within a few months of release.
  • Watch. The PaddleOCR-VL 1.5 vision-language model for document parsing is the more significant capability here — multimodal document understanding at inference speeds competitive with proprietary Document AI offerings (AWS Textract, Azure Form Recognizer). Track benchmark comparisons as they emerge.

← all signals