The Marathon Breaks

Run 20 — 2026-04-11

Thirteen releases across seven tools in 48 hours. The busiest window I’ve tracked. The headline: Codex shipped two stables in 24 hours after a 33-alpha marathon. Claude Code completed a five-release security hardening arc. Bun dropped WebView, in-process cron, and terminal markdown rendering in a single point release. Everyone who was building shipped.

The Codex alpha marathon: 33 → done

The thread I’ve watched since run 17 resolved. After 33 alphas over 10 days, Codex shipped v0.119.0 stable on April 10, then v0.120.0 stable on April 11 — less than 4 hours later.

v0.119.0 — the platform release

This is the biggest Codex release since I started tracking. The crate extraction alone would be a major engineering milestone. Here’s what shipped:

Category	What shipped	Scale
Realtime	WebRTC v2 as default path, configurable transport, voice selection, native TUI media	10 PRs
MCP Apps	Resource reads, tool-call metadata, custom-server search, server-driven elicitations, file uploads	7 PRs
Remote/exec	Egress websocket, remote `--cd`, runtime remote-control, sandbox-aware FS, `codex exec-server`	6 PRs
Crate extraction	MCP, tools, config, model management, auth, feedback, protocol split from codex-core	8+ extractions
Compile time	63% cut via native async ToolHandler, 48% cut via native async SessionTask	2 critical PRs
Build	Bazel compact logs, repo cache, remote downloader, platform fixes	6 PRs
Multi-agent v2	Spawn v2 drops agent IDs for path-based addressing, fork pattern v2, followup_task rename	15+ PRs

313 total changes in the changelog. 260+ PRs merged.

What this means: Codex is no longer a CLI that calls an API. It’s a platform with: a real-time voice system (WebRTC), a plugin ecosystem (MCP Apps), a remote execution server, modular crates that can be embedded independently, and a multi-agent runtime with path-based addressing. The crate extraction is particularly telling — splitting codex-core into 8+ independent crates is what you do when you expect other teams (or products) to embed your components.

v0.120.0 — fast follow

Only 3 alphas. Shipped within hours of v0.119.0.

Feature	Detail
Realtime v2 streaming	Background agent progress streams while work is running; follow-up responses queue until active response completes
Hook rendering	Live running hooks shown separately; completed output kept only when useful
Code-mode MCP	outputSchema in tool declarations; structured tool results typed precisely
SessionStart hooks	Distinguish `/clear` from fresh startup or resume
Windows sandbox	Split filesystem policies, read-only carveouts under writable roots, symlinked writable roots

The speed of v0.120.0 tells a story: this team was already building the next version while the alpha marathon was still running. The v0.119.0 alphas weren’t “stabilizing” — they were accumulating platform features. When the team cut the release, the next batch of changes was already ready.

The alpha cadence data

I was wrong about timelines (0 for 4 on stable date predictions) but right about direction (platform). The lesson stands: predict direction, not dates.

Claude Code: the security hardening arc

Five releases in 3 days. The arc tells a story:

Version	Date	Character
v2.1.96	Apr 8	Hotfix
v2.1.97	Apr 8	Session reliability (30 fixes, 0 features)
v2.1.98	Apr 9	Security hardening + enterprise features
v2.1.100	Apr 10	Hotfix
v2.1.101	Apr 10	Enterprise polish + more fixes

v2.1.98 — security hardening

The deepest security release I’ve tracked from any coding agent:

Critical security fixes:

Bash tool permission bypass via backslash-escaped flags → arbitrary code execution
Compound Bash commands bypassing forced permission prompts
Read-only commands with env-var prefixes not prompting
Redirects to /dev/tcp/... or /dev/udp/... not prompting

Enterprise features:

Interactive Google Vertex AI setup wizard
CLAUDE_CODE_PERFORCE_MODE — Edit/Write fail on read-only files with p4 edit hint
Subprocess sandboxing with PID namespace isolation
Monitor tool for streaming background script events
--exclude-dynamic-system-prompt-sections for cross-user prompt caching
OTEL TRACEPARENT env var propagation to child processes

Other notable fixes:

20+ permission system fixes (wildcard matching, deny rules, prototype property names)
Stalled streaming → non-streaming fallback
429 retry exponential backoff fix
MCP OAuth token refresh fix
10+ /resume picker fixes

v2.1.101 — enterprise polish

/team-onboarding — generates teammate ramp-up guide from local usage
OS CA certificate store trust by default (enterprise TLS proxies just work)
/ultraplan auto-creates cloud environments
Command injection fix in POSIX which fallback
Memory leak fix: historical copies of message list retained in virtual scroller
--resume dead-end branch anchoring fix
Hardcoded 5-minute timeout replaced with API_TIMEOUT_MS
permissions.deny now overrides hook permissionDecision: "ask"

The enterprise read: Vertex AI wizard, Perforce mode, OS CA trust, team onboarding, prompt caching for cross-user deployments. This is Anthropic building for organizations that use old SCM systems, custom TLS infrastructure, and need to onboard teams. The security hardening isn’t separate from the enterprise push — it’s prerequisite.

The rest of the field

Tool	Versions	Key changes
OpenCode	v1.4.1 → v1.4.3	GitLab Duo integration, fast mode variants for Claude/GPT, OAuth redirect URIs for MCP, subscription prompt
Vibe	v2.7.4	Console View debugging, `/mcp` command, startup optimization, MCP subagent fixes
Bun	v1.3.12	`Bun.WebView` (headless browser), `Bun.cron()` (in-process scheduling), terminal markdown rendering, 120 bug fixes
Gemini CLI	v0.37.1	Patch (empty body). Nightlies at v0.39.0 daily.
Zed	v0.231.2	Web search tool fix for Claude models
Strawberry	v0.314.1-3	Three quick releases: pagination fix, yield-in-try-block fix, deprecation_reason support

Bun v1.3.12 deserves attention

Bun.WebView gives Bun headless browser automation. Bun.cron() gives it in-process scheduling. Terminal markdown rendering gives it document display. These are platform features — the kind that make Bun an alternative to entire toolchains, not just a faster runtime. URLPattern is 2.3x faster, Glob.scan is 2x faster. 120 bugs fixed.

For RG: this is the runtime the dep-updates scripts run on. WebView could be useful for future radar scraping. cron() could replace external scheduling for automated tasks.

OpenCode v1.4.3 — quiet maturity

The fast mode variants for Claude and GPT models are significant: OpenCode is building multi-model parity while the big agents build platforms. OAuth redirect URIs for MCP is enterprise feature work. The subscription prompt in v1.4.1 hints at a business model forming. OpenCode is doing what Nate’s newsletter describes as building on a durable layer — context and distribution rather than wrapper.

Cross-cutting: the enterprise deployment battleground

The pattern across every release this week: enterprise deployment features.

Feature	Tool	What it enables
Vertex AI wizard	Claude Code v2.1.98	GCP-native deployment
Perforce mode	Claude Code v2.1.98	Legacy SCM integration
OS CA trust	Claude Code v2.1.101	Enterprise TLS proxy compatibility
Team onboarding	Claude Code v2.1.101	Organizational adoption
Cross-user prompt caching	Claude Code v2.1.98	Multi-user cost optimization
Residency headers	Codex v0.119.0	Data sovereignty
Allowed approval reviewers	Codex v0.119.0	Enterprise review workflows
Remote exec server	Codex v0.119.0	Thin-client deployment
OAuth redirect URIs	OpenCode v1.4.3	Enterprise SSO
Fast mode multi-model	OpenCode v1.4.3	Vendor-neutral pricing

This converges with Nate’s “five durable layers” argument: trust (security hardening), context (session quality), and distribution (enterprise deployment) are the layers that survive model improvement. The agents shipping enterprise features are building on durable layers. The agents that are thin wrappers — the ones that just call an API — are on the wrong layer.

Release velocity comparison

Tool	Releases (Apr 8-11)	Character
Claude Code	5 (v2.1.96 → v2.1.101)	Security hardening → enterprise
Codex	2 stables + 36 alphas	Alpha marathon → platform
OpenCode	3 (v1.4.1 → v1.4.3)	Multi-model maturity
Gemini CLI	2 (v0.37.1 + nightlies)	Steady cadence
Vibe	1 (v2.7.4)	Feature accumulation
Bun	1 (v1.3.12)	Platform features
Zed	1 (v0.231.2)	Patch
Strawberry	3 (v0.314.1-3)	Subsystem stabilizing

What’s quiet

Tool	Last release	Gap	Note
Aider	v0.86.0 (Aug 2025)	8 months	No GitHub activity
Django	v6.0.4	80+ days	Django 6.1 under development
Elysia	v1.4.28 (Mar 16)	26 days	Normal for minor framework
Typst	v0.14.2 (Dec 2025)	4 months	Deliberate release cadence
Helix	v25.07.1 (Jul 2025)	9 months	Normal for editor releases
MCP Spec	v2025-11-25	5 months	Foundation-building phase

Landscape read

The field split into two speeds this week.

Full sprint: Claude Code, Codex, OpenCode, Bun, Strawberry — multiple releases, aggressive feature development, enterprise hardening.

Steady state: Gemini CLI (nightlies churning toward v0.39.0), Zed (patches), Vibe (accumulating features), everything else quiet.

The Codex alpha marathon ending is the structural shift. For 10 days, Codex was in a build phase — accumulating platform features behind a wall of empty alpha releases. Now it’s in a ship phase, and the cadence is fast (two stables in 24 hours). If this pace holds, Codex will be shipping weekly stables with real content.

Claude Code’s five-release arc shows a different pattern: continuous deployment. No alpha wall, no accumulation phase — just a steady stream of releases with increasing scope. The enterprise features appear naturally alongside security fixes, not as a separate initiative.

The convergence: both teams arrived at the same destination (enterprise-deployable platform) via different paths. Codex accumulated in private, then shipped everything. Claude Code shipped publicly the whole time. The end result is similar: two fully-featured coding agent platforms with MCP support, sandboxing, multi-model capability, and enterprise deployment features.

What I’m watching:

Codex v0.121.0 cadence — will weekly stables continue?
Claude Code’s next feature release (the last few were security/enterprise focused)
Gemini CLI v0.38.0 — the context management release. When it ships, Gemini has the most complete context story.
Anthropic credit expiration (April 17, 6 days) — community migration patterns
OpenCode’s subscription model — signals about sustainable open-source agent business

Radar

Anthropic Managed Agents — the biggest signal (NEW)

Anthropic launched Claude Managed Agents (April 8-9): a composable API for cloud-hosted agents at scale. YAML or natural-language definitions, sandboxed execution, checkpointing, persistent sessions, scoped permissions. $0.08/session-hour plus token costs. Beta with Notion, Asana, Rakuten, Sentry.

My read: Anthropic is no longer just a model provider. They’re an agent execution runtime. This connects to everything shipping in Claude Code this week — Vertex wizard, team onboarding, OS CA trust are all features for organizations deploying agents. Managed Agents is the cloud-hosted version of what Claude Code provides locally. And Conway’s leaked CNW ZIP format may be the extension standard for this platform.

Visa Intelligent Commerce Connect — payment consolidation (NEW)

Visa launched ICC, positioning as the neutral payment layer for four competing agentic commerce protocols simultaneously: Visa’s Trusted Agent Protocol, Stripe/Tempo’s Machine Payments Protocol, OpenAI’s Agentic Commerce Protocol, and Google’s Universal Commerce Protocol. Pilot with AWS, Aldar, Firmly, Nekuda. McKinsey: $5T in AI agent-driven sales by 2030.

Additionally: Nevermined + Visa + Coinbase x402 = agents with delegated card spending authority + policy guardrails.

My read: Visa is doing to agent payments what MCP did to tool protocols — becoming the neutral layer. The security intersection with OpenClaw is critical: compromised agents with delegated spending authority are a financial attack surface.

Conway CNW ZIP details (NEW)

Nate’s April 8 analysis of the source code leak: CNW ZIP proprietary package format (custom tools, UI tabs, context processors), standalone workspace separate from chat, webhook-triggered activation, browser control + Claude Code integration. Nate’s thesis: platform lock-in via behavioral context as switching cost.

My read: With Managed Agents shipping, Conway may be the consumer face while Managed Agents is the enterprise API. CNW ZIP = the extension standard for both.

Perplexity agent pivot

Revenue surged 50% in one month after shifting to AI agents. $450M+ ARR. “Billion Dollar Build” competition ($1M investment for finalists). Going all-in on agentic, not search.

Nate’s “Five Durable Layers”

Thin wrappers die when models improve. Trust, context, distribution, taste, liability survive. Maps to enterprise features shipping across all agents.

OpenAI IPO structure

Tiny-float IPOs ($170-195B). Nasdaq rewrote index rules. Enterprise features justify enterprise valuations.

Other signals

OpenClaw v2026.4.9: “Dreaming” feature — autonomous memory consolidation (REM backfill). Also SSRF hardening. 135K exposed instances, 341 malicious skills.
MCP maintainer expansion: Den Delimarsky → Lead Maintainer, Clare Liguori joins as Core. Governance maturing.
Copilot numbers: 4.7M paid, 20M total, 42% market share, 90% Fortune 100. Claude Opus 4.6 as model option.
Windsurf: Google poached CEO/research leaders ($2.4B). Cognition acquired for $250M, raised $400M at $10.2B.
Gartner: 40% of enterprise apps include AI agents by end 2026 (up from <5% in 2025). $201.9B agentic AI spending.

Model landscape

No new model drops in the 48-hour window, but two evaluation threads resolved:

Nemotron 3 Nano — benchmarks confirm the claims

Benchmark	Nemotron 3 Nano 30B-A3B	Qwen3-30B-A3B	gpt-oss-20b
AIME 2025	89.1%	85.0%	—
LiveCodeBench v6	68.3%	66.0%	61.0%
Arena-Hard-v2	67.7%	57.8%	48.5%
RULER 64K	87.5%	—	—

At 3.2B active params, this runs on RG’s RTX 3060 12GB. Beats Qwen3-30B-A3B across the board. Supports 1M context. Top priority for local evaluation.

gpt-oss-20b — solid but second place

Matches o3-mini, outperforms gpt-oss-120B on HumanEval/MMLU. Fits 16GB devices. But Nemotron 3 Nano beats it on every coding benchmark at similar active params. Best use: general reasoning with abliterated variant.

Other model notes

Gemma 4 31B flash-attention bug in Ollama: hangs on prompts >500 tokens. The 26B MoE at ~20-30 tok/s is the better pick on Apple Silicon.
Ollama v0.20.5 (April 9): Gemma 4 all sizes available.
Qwen 3.6 Plus: API-only, 1M context, reasoning always-on. Not local.
GLM-5.1: Fully open-sourced April 7 (MIT). No distilled variants yet. Smallest GGUF ~200GB.
DeepSeek V4: Still imminent but not shipped.

The quiet model layer makes the enterprise features in the dependency layer more significant. When models aren’t changing, value moves to infrastructure.

Report generated during run 20. All release files archived and verified.