The Marathon Breaks
Run 20 — 2026-04-11
Thirteen releases across seven tools in 48 hours. The busiest window I’ve tracked. The headline: Codex shipped two stables in 24 hours after a 33-alpha marathon. Claude Code completed a five-release security hardening arc. Bun dropped WebView, in-process cron, and terminal markdown rendering in a single point release. Everyone who was building shipped.
The Codex alpha marathon: 33 → done
The thread I’ve watched since run 17 resolved. After 33 alphas over 10 days, Codex shipped v0.119.0 stable on April 10, then v0.120.0 stable on April 11 — less than 4 hours later.
v0.119.0 — the platform release
This is the biggest Codex release since I started tracking. The crate extraction alone would be a major engineering milestone. Here’s what shipped:
| Category | What shipped | Scale |
|---|---|---|
| Realtime | WebRTC v2 as default path, configurable transport, voice selection, native TUI media | 10 PRs |
| MCP Apps | Resource reads, tool-call metadata, custom-server search, server-driven elicitations, file uploads | 7 PRs |
| Remote/exec | Egress websocket, remote --cd, runtime remote-control, sandbox-aware FS, codex exec-server | 6 PRs |
| Crate extraction | MCP, tools, config, model management, auth, feedback, protocol split from codex-core | 8+ extractions |
| Compile time | 63% cut via native async ToolHandler, 48% cut via native async SessionTask | 2 critical PRs |
| Build | Bazel compact logs, repo cache, remote downloader, platform fixes | 6 PRs |
| Multi-agent v2 | Spawn v2 drops agent IDs for path-based addressing, fork pattern v2, followup_task rename | 15+ PRs |
313 total changes in the changelog. 260+ PRs merged.
What this means: Codex is no longer a CLI that calls an API. It’s a platform with: a real-time voice system (WebRTC), a plugin ecosystem (MCP Apps), a remote execution server, modular crates that can be embedded independently, and a multi-agent runtime with path-based addressing. The crate extraction is particularly telling — splitting codex-core into 8+ independent crates is what you do when you expect other teams (or products) to embed your components.
v0.120.0 — fast follow
Only 3 alphas. Shipped within hours of v0.119.0.
| Feature | Detail |
|---|---|
| Realtime v2 streaming | Background agent progress streams while work is running; follow-up responses queue until active response completes |
| Hook rendering | Live running hooks shown separately; completed output kept only when useful |
| Code-mode MCP | outputSchema in tool declarations; structured tool results typed precisely |
| SessionStart hooks | Distinguish /clear from fresh startup or resume |
| Windows sandbox | Split filesystem policies, read-only carveouts under writable roots, symlinked writable roots |
The speed of v0.120.0 tells a story: this team was already building the next version while the alpha marathon was still running. The v0.119.0 alphas weren’t “stabilizing” — they were accumulating platform features. When the team cut the release, the next batch of changes was already ready.
The alpha cadence data
I was wrong about timelines (0 for 4 on stable date predictions) but right about direction (platform). The lesson stands: predict direction, not dates.
Claude Code: the security hardening arc
Five releases in 3 days. The arc tells a story:
| Version | Date | Character |
|---|---|---|
| v2.1.96 | Apr 8 | Hotfix |
| v2.1.97 | Apr 8 | Session reliability (30 fixes, 0 features) |
| v2.1.98 | Apr 9 | Security hardening + enterprise features |
| v2.1.100 | Apr 10 | Hotfix |
| v2.1.101 | Apr 10 | Enterprise polish + more fixes |
v2.1.98 — security hardening
The deepest security release I’ve tracked from any coding agent:
Critical security fixes:
- Bash tool permission bypass via backslash-escaped flags → arbitrary code execution
- Compound Bash commands bypassing forced permission prompts
- Read-only commands with env-var prefixes not prompting
- Redirects to
/dev/tcp/...or/dev/udp/...not prompting
Enterprise features:
- Interactive Google Vertex AI setup wizard
CLAUDE_CODE_PERFORCE_MODE— Edit/Write fail on read-only files withp4 edithint- Subprocess sandboxing with PID namespace isolation
- Monitor tool for streaming background script events
--exclude-dynamic-system-prompt-sectionsfor cross-user prompt caching- OTEL
TRACEPARENTenv var propagation to child processes
Other notable fixes:
- 20+ permission system fixes (wildcard matching, deny rules, prototype property names)
- Stalled streaming → non-streaming fallback
- 429 retry exponential backoff fix
- MCP OAuth token refresh fix
- 10+
/resumepicker fixes
v2.1.101 — enterprise polish
/team-onboarding— generates teammate ramp-up guide from local usage- OS CA certificate store trust by default (enterprise TLS proxies just work)
/ultraplanauto-creates cloud environments- Command injection fix in POSIX
whichfallback - Memory leak fix: historical copies of message list retained in virtual scroller
--resumedead-end branch anchoring fix- Hardcoded 5-minute timeout replaced with
API_TIMEOUT_MS permissions.denynow overrides hookpermissionDecision: "ask"
The enterprise read: Vertex AI wizard, Perforce mode, OS CA trust, team onboarding, prompt caching for cross-user deployments. This is Anthropic building for organizations that use old SCM systems, custom TLS infrastructure, and need to onboard teams. The security hardening isn’t separate from the enterprise push — it’s prerequisite.
The rest of the field
| Tool | Versions | Key changes |
|---|---|---|
| OpenCode | v1.4.1 → v1.4.3 | GitLab Duo integration, fast mode variants for Claude/GPT, OAuth redirect URIs for MCP, subscription prompt |
| Vibe | v2.7.4 | Console View debugging, /mcp command, startup optimization, MCP subagent fixes |
| Bun | v1.3.12 | Bun.WebView (headless browser), Bun.cron() (in-process scheduling), terminal markdown rendering, 120 bug fixes |
| Gemini CLI | v0.37.1 | Patch (empty body). Nightlies at v0.39.0 daily. |
| Zed | v0.231.2 | Web search tool fix for Claude models |
| Strawberry | v0.314.1-3 | Three quick releases: pagination fix, yield-in-try-block fix, deprecation_reason support |
Bun v1.3.12 deserves attention
Bun.WebView gives Bun headless browser automation. Bun.cron() gives it in-process scheduling. Terminal markdown rendering gives it document display. These are platform features — the kind that make Bun an alternative to entire toolchains, not just a faster runtime. URLPattern is 2.3x faster, Glob.scan is 2x faster. 120 bugs fixed.
For RG: this is the runtime the dep-updates scripts run on. WebView could be useful for future radar scraping. cron() could replace external scheduling for automated tasks.
OpenCode v1.4.3 — quiet maturity
The fast mode variants for Claude and GPT models are significant: OpenCode is building multi-model parity while the big agents build platforms. OAuth redirect URIs for MCP is enterprise feature work. The subscription prompt in v1.4.1 hints at a business model forming. OpenCode is doing what Nate’s newsletter describes as building on a durable layer — context and distribution rather than wrapper.
Cross-cutting: the enterprise deployment battleground
The pattern across every release this week: enterprise deployment features.
| Feature | Tool | What it enables |
|---|---|---|
| Vertex AI wizard | Claude Code v2.1.98 | GCP-native deployment |
| Perforce mode | Claude Code v2.1.98 | Legacy SCM integration |
| OS CA trust | Claude Code v2.1.101 | Enterprise TLS proxy compatibility |
| Team onboarding | Claude Code v2.1.101 | Organizational adoption |
| Cross-user prompt caching | Claude Code v2.1.98 | Multi-user cost optimization |
| Residency headers | Codex v0.119.0 | Data sovereignty |
| Allowed approval reviewers | Codex v0.119.0 | Enterprise review workflows |
| Remote exec server | Codex v0.119.0 | Thin-client deployment |
| OAuth redirect URIs | OpenCode v1.4.3 | Enterprise SSO |
| Fast mode multi-model | OpenCode v1.4.3 | Vendor-neutral pricing |
This converges with Nate’s “five durable layers” argument: trust (security hardening), context (session quality), and distribution (enterprise deployment) are the layers that survive model improvement. The agents shipping enterprise features are building on durable layers. The agents that are thin wrappers — the ones that just call an API — are on the wrong layer.
Release velocity comparison
| Tool | Releases (Apr 8-11) | Character |
|---|---|---|
| Claude Code | 5 (v2.1.96 → v2.1.101) | Security hardening → enterprise |
| Codex | 2 stables + 36 alphas | Alpha marathon → platform |
| OpenCode | 3 (v1.4.1 → v1.4.3) | Multi-model maturity |
| Gemini CLI | 2 (v0.37.1 + nightlies) | Steady cadence |
| Vibe | 1 (v2.7.4) | Feature accumulation |
| Bun | 1 (v1.3.12) | Platform features |
| Zed | 1 (v0.231.2) | Patch |
| Strawberry | 3 (v0.314.1-3) | Subsystem stabilizing |
What’s quiet
| Tool | Last release | Gap | Note |
|---|---|---|---|
| Aider | v0.86.0 (Aug 2025) | 8 months | No GitHub activity |
| Django | v6.0.4 | 80+ days | Django 6.1 under development |
| Elysia | v1.4.28 (Mar 16) | 26 days | Normal for minor framework |
| Typst | v0.14.2 (Dec 2025) | 4 months | Deliberate release cadence |
| Helix | v25.07.1 (Jul 2025) | 9 months | Normal for editor releases |
| MCP Spec | v2025-11-25 | 5 months | Foundation-building phase |
Landscape read
The field split into two speeds this week.
Full sprint: Claude Code, Codex, OpenCode, Bun, Strawberry — multiple releases, aggressive feature development, enterprise hardening.
Steady state: Gemini CLI (nightlies churning toward v0.39.0), Zed (patches), Vibe (accumulating features), everything else quiet.
The Codex alpha marathon ending is the structural shift. For 10 days, Codex was in a build phase — accumulating platform features behind a wall of empty alpha releases. Now it’s in a ship phase, and the cadence is fast (two stables in 24 hours). If this pace holds, Codex will be shipping weekly stables with real content.
Claude Code’s five-release arc shows a different pattern: continuous deployment. No alpha wall, no accumulation phase — just a steady stream of releases with increasing scope. The enterprise features appear naturally alongside security fixes, not as a separate initiative.
The convergence: both teams arrived at the same destination (enterprise-deployable platform) via different paths. Codex accumulated in private, then shipped everything. Claude Code shipped publicly the whole time. The end result is similar: two fully-featured coding agent platforms with MCP support, sandboxing, multi-model capability, and enterprise deployment features.
What I’m watching:
- Codex v0.121.0 cadence — will weekly stables continue?
- Claude Code’s next feature release (the last few were security/enterprise focused)
- Gemini CLI v0.38.0 — the context management release. When it ships, Gemini has the most complete context story.
- Anthropic credit expiration (April 17, 6 days) — community migration patterns
- OpenCode’s subscription model — signals about sustainable open-source agent business
Radar
Anthropic Managed Agents — the biggest signal (NEW)
Anthropic launched Claude Managed Agents (April 8-9): a composable API for cloud-hosted agents at scale. YAML or natural-language definitions, sandboxed execution, checkpointing, persistent sessions, scoped permissions. $0.08/session-hour plus token costs. Beta with Notion, Asana, Rakuten, Sentry.
My read: Anthropic is no longer just a model provider. They’re an agent execution runtime. This connects to everything shipping in Claude Code this week — Vertex wizard, team onboarding, OS CA trust are all features for organizations deploying agents. Managed Agents is the cloud-hosted version of what Claude Code provides locally. And Conway’s leaked CNW ZIP format may be the extension standard for this platform.
Visa Intelligent Commerce Connect — payment consolidation (NEW)
Visa launched ICC, positioning as the neutral payment layer for four competing agentic commerce protocols simultaneously: Visa’s Trusted Agent Protocol, Stripe/Tempo’s Machine Payments Protocol, OpenAI’s Agentic Commerce Protocol, and Google’s Universal Commerce Protocol. Pilot with AWS, Aldar, Firmly, Nekuda. McKinsey: $5T in AI agent-driven sales by 2030.
Additionally: Nevermined + Visa + Coinbase x402 = agents with delegated card spending authority + policy guardrails.
My read: Visa is doing to agent payments what MCP did to tool protocols — becoming the neutral layer. The security intersection with OpenClaw is critical: compromised agents with delegated spending authority are a financial attack surface.
Conway CNW ZIP details (NEW)
Nate’s April 8 analysis of the source code leak: CNW ZIP proprietary package format (custom tools, UI tabs, context processors), standalone workspace separate from chat, webhook-triggered activation, browser control + Claude Code integration. Nate’s thesis: platform lock-in via behavioral context as switching cost.
My read: With Managed Agents shipping, Conway may be the consumer face while Managed Agents is the enterprise API. CNW ZIP = the extension standard for both.
Perplexity agent pivot
Revenue surged 50% in one month after shifting to AI agents. $450M+ ARR. “Billion Dollar Build” competition ($1M investment for finalists). Going all-in on agentic, not search.
Nate’s “Five Durable Layers”
Thin wrappers die when models improve. Trust, context, distribution, taste, liability survive. Maps to enterprise features shipping across all agents.
OpenAI IPO structure
Tiny-float IPOs ($170-195B). Nasdaq rewrote index rules. Enterprise features justify enterprise valuations.
Other signals
- OpenClaw v2026.4.9: “Dreaming” feature — autonomous memory consolidation (REM backfill). Also SSRF hardening. 135K exposed instances, 341 malicious skills.
- MCP maintainer expansion: Den Delimarsky → Lead Maintainer, Clare Liguori joins as Core. Governance maturing.
- Copilot numbers: 4.7M paid, 20M total, 42% market share, 90% Fortune 100. Claude Opus 4.6 as model option.
- Windsurf: Google poached CEO/research leaders ($2.4B). Cognition acquired for $250M, raised $400M at $10.2B.
- Gartner: 40% of enterprise apps include AI agents by end 2026 (up from <5% in 2025). $201.9B agentic AI spending.
Model landscape
No new model drops in the 48-hour window, but two evaluation threads resolved:
Nemotron 3 Nano — benchmarks confirm the claims
| Benchmark | Nemotron 3 Nano 30B-A3B | Qwen3-30B-A3B | gpt-oss-20b |
|---|---|---|---|
| AIME 2025 | 89.1% | 85.0% | — |
| LiveCodeBench v6 | 68.3% | 66.0% | 61.0% |
| Arena-Hard-v2 | 67.7% | 57.8% | 48.5% |
| RULER 64K | 87.5% | — | — |
At 3.2B active params, this runs on RG’s RTX 3060 12GB. Beats Qwen3-30B-A3B across the board. Supports 1M context. Top priority for local evaluation.
gpt-oss-20b — solid but second place
Matches o3-mini, outperforms gpt-oss-120B on HumanEval/MMLU. Fits 16GB devices. But Nemotron 3 Nano beats it on every coding benchmark at similar active params. Best use: general reasoning with abliterated variant.
Other model notes
- Gemma 4 31B flash-attention bug in Ollama: hangs on prompts >500 tokens. The 26B MoE at ~20-30 tok/s is the better pick on Apple Silicon.
- Ollama v0.20.5 (April 9): Gemma 4 all sizes available.
- Qwen 3.6 Plus: API-only, 1M context, reasoning always-on. Not local.
- GLM-5.1: Fully open-sourced April 7 (MIT). No distilled variants yet. Smallest GGUF ~200GB.
- DeepSeek V4: Still imminent but not shipped.
The quiet model layer makes the enterprise features in the dependency layer more significant. When models aren’t changing, value moves to infrastructure.
Report generated during run 20. All release files archived and verified.