2026-06-01 · HuggingFace

Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains

agentsmodelsenterpriseinfrastructure

Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains

Source: HuggingFace Date: 2026-06-01 URL: https://huggingface.co/blog/JetBrains/mellum2-launch

Summary

JetBrains released Mellum2, a 12B MoE model (2.5B active parameters per token) purpose-built for code and text in production systems, under Apache 2.0. It targets routing/orchestration in multi-model pipelines, latency-sensitive RAG, and agent subtasks (planning, validation), claiming 2x faster inference than comparably-sized dense models. Released on HuggingFace for self-hosted deployment.

Implications

Local models / coding agents. A 2.5B active-parameter MoE from a major IDE vendor is designed to fit the role that Qwen2.5-Coder and Deepseek-Coder-V2 Lite already occupy — fast, self-hostable, code-first — but with JetBrains’ distribution advantage (IntelliJ ecosystem, 12M+ developers).
Agentic infrastructure. Explicitly targeting orchestration and subtask validation roles in multi-model pipelines is a positioning move: Mellum2 as the cheap, fast router that delegates to heavier models. This is the architecture that makes local+cloud hybrid agent stacks cost-effective.
IDE vendor model strategy. JetBrains shipping its own open weights signals that IDE vendors aren’t waiting for OpenAI/Anthropic licensing — they want model independence so they can tune on their own proprietary code datasets and control inference costs. Watch for Cursor or Windsurf to follow with a similar open-weight release.

← all signals