2025-07-16 · HuggingFace

Ettin Suite: SoTA Paired Encoders and Decoders

modelsresearch

Ettin Suite: SoTA Paired Encoders and Decoders

Source: HuggingFace Date: 2025-07-16 URL: https://huggingface.co/blog/ettin

Summary

Research and model release: Ettin Suite — 12 models (6 sizes, 17M-1B, encoder and decoder pairs) trained with identical recipes on 2T tokens (fully open), differing only in attention pattern (bidirectional vs causal) and objective (MLM vs CLM). MNLI classification: 150M encoder (89.2%) outperforms 400M decoder (88.2%). Encoders beat ModernBERT across all tasks; decoders beat Llama 3.2 1B and SmolLM2 on generative tasks with gap widening at scale. Cross-objective conversion (MLM↔CLM) consistently underperforms. 250+ training checkpoints released.

Implications

Open-weights ecosystem health. Ettin’s controlled comparison confirms encoder vs decoder architectural advantages are real and persistent, not artifacts of different training data or recipes. The finding that cross-objective fine-tuning (turning a decoder into an encoder via MLM) underperforms is practically important: teams experimenting with adapter-based conversion should expect inherent limits from the original architecture choice.

Transformers library trajectory. 250+ training checkpoints on 2T tokens released publicly is an unusually complete research artifact — it enables training dynamics studies that require seeing the model at every stage of learning, not just the final weights. The identical recipe design makes Ettin the reference implementation for controlled architecture comparisons at the 17M-1B scale range.

← all signals