2026-04-24 · HuggingFace

DeepSeek-V4: a million-token context that agents can actually use

protocolsagentsmodelsresearch

DeepSeek-V4: a million-token context that agents can actually use

Source: HuggingFace Date: 2026-04-24 URL: https://huggingface.co/blog/deepseekv4

Summary

Model release: DeepSeek-V4 (Pro: 1.6T total/49B active params; Flash: 284B/13B). Key architectural innovations: Compressed Sparse Attention + Heavily Compressed Attention hybrid, achieving 27% single-token FLOPs and 2% KV cache vs standard GQA. 1M token context with >0.82 MRCR accuracy through 256K tokens. Agent benchmarks: SWE Verified 80.6 (parity with Opus 4.6 at 80.8), Terminal Bench 2.0 at 67.9 (second to GPT-5.4). XML-based |DSML| tool-call schema with dedicated tokens reduces parsing failures. MIT licensed.

Implications

Thread: open-weights ecosystem health / model release cadence. DeepSeek V4 is the most capable open-weights model for agentic tasks as of release — 80.6 on SWE Verified at parity with Opus 4.6 is a landmark. The KV cache efficiency (2% of standard GQA) makes 1M context practically deployable, not just theoretically possible. The XML tool-call schema is a deliberate departure from JSON-string approaches — watch whether this reduces real agentic failure rates enough that other labs adopt similar dedicated-token approaches. MIT license means this is directly usable in production without API dependency, which changes the competitive dynamics for anyone evaluating API vs self-hosted agent deployments.

← all signals