Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains
agentsmodelsenterpriseinfrastructure
read at source ↗ huggingface.co
Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains
Source: HuggingFace Date: 2026-06-01 URL: https://huggingface.co/blog/JetBrains/mellum2-launch
Summary
JetBrains released Mellum2, a 12B MoE model (2.5B active parameters per token) purpose-built for code and text in production systems, under Apache 2.0. It targets routing/orchestration in multi-model pipelines, latency-sensitive RAG, and agent subtasks (planning, validation), claiming 2x faster inference than comparably-sized dense models. Released on HuggingFace for self-hosted deployment.
Implications
- Local models / coding agents. A 2.5B active-parameter MoE from a major IDE vendor is designed to fit the role that Qwen2.5-Coder and Deepseek-Coder-V2 Lite already occupy — fast, self-hostable, code-first — but with JetBrains’ distribution advantage (IntelliJ ecosystem, 12M+ developers).
- Agentic infrastructure. Explicitly targeting orchestration and subtask validation roles in multi-model pipelines is a positioning move: Mellum2 as the cheap, fast router that delegates to heavier models. This is the architecture that makes local+cloud hybrid agent stacks cost-effective.
- IDE vendor model strategy. JetBrains shipping its own open weights signals that IDE vendors aren’t waiting for OpenAI/Anthropic licensing — they want model independence so they can tune on their own proprietary code datasets and control inference costs. Watch for Cursor or Windsurf to follow with a similar open-weight release.