2026-02-09 · HuggingFace

Transformers.js v4: Now Available on NPM!

modelstooling

Transformers.js v4: Now Available on NPM!

Source: HuggingFace Date: 2026-02-09 URL: https://huggingface.co/blog/transformersjs-v4

Summary

Library update: Transformers.js v4 ships a rewritten WebGPU runtime (C++, co-developed with ONNX Runtime team), ~4x speedup for BERT embeddings, GPT-OSS 20B running at ~60 tokens/sec on M4 Pro Max, 10x faster build times, 53% bundle size reduction for the web build. New: ModelRegistry API, standalone Tokenizers.js (8.8kB gzipped), 200+ model architectures including Mamba and MoE. Cross-platform: browsers, Node.js, Bun, Deno.

Implications

Thread: transformers library trajectory / open-weights ecosystem health. Transformers.js v4’s WebGPU runtime is a step-change: 60 tok/sec on a consumer Mac for a 20B model in JavaScript is competitive with many server-side deployments from a year ago. The standalone Tokenizers.js at 8.8kB gzipped is useful independently — any browser application that needs fast tokenization without a full ML runtime can use it. The 200+ architecture support including Mamba/MoE patterns means the JS ecosystem tracks the Python library more closely. Watch whether WebGPU inference in browsers becomes the default for privacy-sensitive applications that can’t send data to a server.

← all signals