Announcing NeurIPS 2025 E2LM Competition: Early Training Evaluation of Language Models
read at source ↗ huggingface.co
Announcing NeurIPS 2025 E2LM Competition: Early Training Evaluation of Language Models
Source: HuggingFace Date: 2025-07-04 URL: https://huggingface.co/blog/tiiuae/e2lm-competition
Summary
Competition announcement: NeurIPS 2025 E2LM (Early Training Evaluation of Language Models) competition, organized by TII UAE with Spotify, Caltech, Sorbonne, and Oxford. Goal: create benchmarks that provide meaningful evaluation signal during early training (0–200B tokens), when existing benchmarks fail to differentiate models. Hosted on HF with lm-evaluation-harness; targets 0.5B, 1B, 3B model sizes. Weighted scoring: Signal Quality (50%), Scientific Knowledge Compliance (40%), Ranking Consistency (10%). $12K in prizes plus $4K student awards. Timeline: July–November 2025.
Implications
Open-weights ecosystem health. Early training evaluation is a genuine gap — teams currently train models for hundreds of billions of tokens before getting informative benchmark signal, which is expensive and wasteful. If E2LM produces viable early-stage benchmarks, it could reduce the compute cost of model development by enabling better early pruning of poor training runs.
HF as open-source ML hub. HF hosting a NeurIPS competition with lm-evaluation-harness as the evaluation infrastructure reinforces HF’s position as the venue for community-driven ML research challenges — not just model deployment. TII UAE sponsoring this alongside their model releases demonstrates the Gulf state AI lab pattern: funding the evaluation infrastructure alongside the models.