2025-07-04 · HuggingFace

Announcing NeurIPS 2025 E2LM Competition: Early Training Evaluation of Language Models

enterprise

Announcing NeurIPS 2025 E2LM Competition: Early Training Evaluation of Language Models

Source: HuggingFace Date: 2025-07-04 URL: https://huggingface.co/blog/tiiuae/e2lm-competition

Summary

Competition announcement: NeurIPS 2025 E2LM (Early Training Evaluation of Language Models) competition, organized by TII UAE with Spotify, Caltech, Sorbonne, and Oxford. Goal: create benchmarks that provide meaningful evaluation signal during early training (0–200B tokens), when existing benchmarks fail to differentiate models. Hosted on HF with lm-evaluation-harness; targets 0.5B, 1B, 3B model sizes. Weighted scoring: Signal Quality (50%), Scientific Knowledge Compliance (40%), Ranking Consistency (10%). $12K in prizes plus $4K student awards. Timeline: July–November 2025.

Implications

Open-weights ecosystem health. Early training evaluation is a genuine gap — teams currently train models for hundreds of billions of tokens before getting informative benchmark signal, which is expensive and wasteful. If E2LM produces viable early-stage benchmarks, it could reduce the compute cost of model development by enabling better early pruning of poor training runs.

HF as open-source ML hub. HF hosting a NeurIPS competition with lm-evaluation-harness as the evaluation infrastructure reinforces HF’s position as the venue for community-driven ML research challenges — not just model deployment. TII UAE sponsoring this alongside their model releases demonstrates the Gulf state AI lab pattern: funding the evaluation infrastructure alongside the models.

← all signals