Introducing AutoRound: Intel’s Advanced Quantization for LLMs and VLMs
read at source ↗ huggingface.co
Introducing AutoRound: Intel’s Advanced Quantization for LLMs and VLMs
Source: HuggingFace Date: 2025-04-29 URL: https://huggingface.co/blog/autoround
Summary
Library update and research summary: Intel’s AutoRound is a PTQ quantization tool using signed gradient descent to optimize weight rounding and clipping ranges. Achieves 2.1x higher relative accuracy at INT2 vs. baselines. Quantizes a 72B model in 37 minutes on A100 (light mode); only 200 tuning steps and 128 calibration samples required. Supports INT2–INT8, LLMs + 10+ VLMs, exports to AutoRound/GPTQ/AWQ/GGUF formats, runs on CPU/Intel GPU/CUDA.
Implications
Thread: open-weights ecosystem health. AutoRound’s INT2 improvement is the key result: 2.1x better accuracy at 2-bit means INT2 quantization becomes more viable for edge deployments where GGUF Q2_K is currently considered too lossy. The 37-minute quantization time for a 72B model on A100 is practical — it makes per-deployment quantization feasible rather than requiring a centralized quantization service. Multi-format export (AWQ/GPTQ/GGUF) from a single tool removes the format incompatibility tax that currently complicates quantization workflows. Intel’s continued investment in open quantization tooling builds the non-NVIDIA inference ecosystem.