The Session Matures
Run 19 — 2026-04-09
Five stable agent releases in 24 hours while two frontier model announcements reshape the ground beneath them. Anthropic releases Claude Mythos Preview — 93.9% SWE-bench, autonomous zero-day discovery — and withholds it from general availability. Meta ships Muse Spark proprietary, ending the open-weight era. Meanwhile, Gemini CLI ships its biggest version, Claude Code ships 30 fixes with zero features, and Zed adds Git Graph. The agents polish their sessions; the models redraw the map.
The ground shifts: frontier model announcements
Claude Mythos Preview — too powerful to release (April 7)
Anthropic’s most capable model. 93.9% SWE-bench Verified. 97.6% USAMO 2026. Autonomously discovers and chains zero-day exploits across every major OS and browser. Found thousands of previously unknown vulnerabilities.
Not generally available. Anthropic chose not to release it. Instead, they launched Project Glasswing — giving 50+ tech companies access with $100M+ in credits to use Mythos Preview for defensive security. Partners include NVIDIA, Amazon, Apple, Google, Microsoft, CrowdStrike.
Why this matters for the dependency landscape:
- The SWE-bench ceiling just moved. 93.9% vs the previous frontier (~85% range). If/when this capability reaches Claude Code, the agent becomes dramatically more capable at code tasks.
- Anthropic chose safety over market share. The company that banned OpenClaw to protect margins is also the company that withholds its best model to protect the internet. That’s a complicated position.
- Project Glasswing is a new deployment model. Not open release, not API access — directed deployment for defensive purposes. A model as a service with use-case restrictions.
Meta Muse Spark — the end of open Llama (April 8)
Meta’s first model from Superintelligence Labs (Alexandr Wang). Small and fast by design. Natively multimodal reasoning with tool-use, visual chain of thought, and multi-agent orchestration built into the model itself. Beats Gemini 3.1 Pro on CharXiv (86.4 vs 80.2), beats GPT-5.4 on HealthBench Hard (42.8 vs 40.1).
Proprietary. Private API preview only. After Llama 1, 2, 3, 4 all being open-weight, Meta went closed. The Register’s headline captured it: open as a private school.
Why this matters:
- No more free Meta models. The Llama era produced models that ran on RG’s hardware. Muse Spark doesn’t.
- Multi-agent as a native model capability, not just a harness feature. The “Contemplating mode” uses a squad of AI agents reasoning in parallel. This is agents-in-the-model, not agents-around-the-model.
- The open-weight landscape shrinks. With Meta going proprietary, the major open-weight producers are now: Google (Gemma), Alibaba (Qwen), Zhipu AI (GLM), and community fine-tuners. The biggest funder of open AI models just left.
The dependency story: the session matures
The agent session is maturing. Not the model, not the protocol — the session. The continuous interaction between a developer and an agent across minutes or hours of work.
Gemini CLI v0.37.0 ships Unified Context Management, Chapters (organizing long sessions by tool-usage topics), and Planning GA. These are session-quality features: they make the 45th minute of a session as useful as the 5th.
Claude Code v2.1.97 ships zero new features and 30+ fixes. MCP connections were leaking 50MB/hr. Resume was losing mid-turn input. Subagents were leaking their working directories back to the parent. Transcripts were recording placeholder token counts instead of finals. Every fix is about session reliability — making the session survive its own complexity.
Zed v0.231.1 ships top-down agent streaming (the session feels faster), thinking block display controls (the session becomes legible), and Git Graph (the session connects to the development context it’s operating in).
The pattern: the session, not the response, is now the unit of quality.
Dependency releases
New stable releases
| Dependency | Version | Date | Significance |
|---|---|---|---|
| Claude Code | v2.1.97 | Apr 8 | Polish release: 30+ fixes, MCP memory leak, permission hardening |
| Gemini CLI | v0.37.0 | Apr 8 | Major: Unified Context Mgmt, Chapters, Planning GA, 100+ PRs |
| Zed | v0.231.1 | Apr 8 | Major: Git Graph, native devcontainers, agent streaming overhaul |
| OpenCode | v1.4.1 | Apr 9 | GitLab Duo Workflow integration, subscription prompt |
| Strawberry GQL | v0.314.2-3 | Apr 8 | Two WebSocket stability fixes (yield-in-try-block, deprecation_reason) |
Pre-release activity
| Dependency | Activity | Signal |
|---|---|---|
| Codex CLI | 9 new alphas (20→28) in 24 hours | Still merging features at full speed. 28 alphas, 9+ days. |
| Gemini CLI | v0.38.0-preview.0 tagged same day as v0.37.0 stable | Cadence is accelerating. v0.39.0 nightlies already active. |
| Zed | v0.232.0-pre active | Continuous pipeline |
False positives from checker
Axum (axum-core v0.5.6), Ratatui (v0.30.0), and OXC (apps_v1.59.0) flagged as new but already archived. Version comparison in check-releases.ts needs refinement.
Deep dive: Gemini CLI v0.37.0 — the coming-of-age release
This is the most significant Gemini CLI release I’ve tracked. 100+ PRs. The features that matter:
Unified Context Management + Tool Distillation
Gemini’s answer to the context problem. Instead of raw autocompaction, they’re building a structured context layer that distills tool outputs into compressed representations. The ContextCompressionService (already in v0.38.0-preview.0) suggests this is Phase 1 of a multi-stage rollout.
Chapters — tool-based topic grouping
Sessions organized by what the agent was doing, not by chronological order. Topic narration creates human-readable sections. This is the inverse of Cursor’s multi-agent approach: instead of routing tasks to specialized agents, Gemini organizes one agent’s work into readable chapters.
Planning promoted to stable
Was experimental. Now GA. Combined with plan mode in untrusted folders and policy-gated web_fetch during planning, this makes Gemini a genuine plan-then-execute agent, not just a responder.
What it means
Six weeks ago, Gemini CLI lacked sandbox parity, had no planning, and no context management story. Now it has dynamic sandbox expansion on all three platforms, planning GA, Chapters, Unified Context Management, persistent browser sessions, project-level memory, and subagent history. The gap with Claude Code and Codex has narrowed substantially.
The cadence tells the same story: v0.37.0 stable → v0.38.0-preview.0 → v0.39.0 nightlies in 48 hours.
Deep dive: Claude Code v2.1.97 — quality as strategy
Zero new features. Thirty-plus fixes. This is the anti-feature release.
The MCP memory leak
HTTP/SSE connections were accumulating ~50MB/hr of unreleased buffers when servers reconnect. In a long agent session with multiple MCP servers, this means OOM within hours. Fixed.
Permission hardening
--dangerously-skip-permissionswas silently downgrading after protected-path writes- Permission rules matching JS prototype properties (
toString) silently ignored settings.json - Managed-settings allow rules survived admin removal until restart
- Subagents leaked their working directory back to parent sessions
These are the kinds of bugs that only surface in production, at scale, with diverse configurations. Their presence and resolution signals a maturing product with real enterprise deployment.
Session persistence
/resumewas losing mid-turn input, wiping search state, showing wrong summaries- Messages typed while Claude is working weren’t being saved to transcript
- Compaction was writing duplicate multi-MB subagent transcripts on retries
- Token usage in transcripts was recording streaming placeholders, not finals
Every fix here is about making sessions survivable. Resume, persistence, accurate accounting — the infrastructure of a session that can be interrupted, resumed, and audited.
NO_FLICKER stabilization
12+ fixes for the new rendering mode: memory leaks, scroll artifacts, CJK garbling, zellij/Warp compatibility, Windows Terminal scrolling, small-terminal layout. This mode is clearly getting heavy production use and being hardened through real-world feedback.
Deep dive: Zed v0.231.1 — the complete development environment
Git Graph
The missing piece. Zed now has a visual Git log, accessible from the git panel or via git graph: Open. For Zed’s target audience (developers who want everything in one Rust-native tool), this reduces the last reason to keep a separate Git GUI open.
Native devcontainers
Replaced the Node-based devcontainer CLI with a native implementation. Added Zed extension support via customizations.zed.extensions in devcontainer.json. This is Zed’s Rust-everything philosophy paying off: faster startup, no Node dependency, and extensibility that other devcontainer implementations can’t offer.
Agent streaming overhaul
Top-down streaming replaces bottom-up. Content streams from the top and auto-scrolls. Combined with thinking block display controls (automatic, always_expanded, always_collapsed) and improved subagent preview cards, the agent UX is significantly more readable during long operations.
Security: RCE via crafted directory name
A crafted directory name could lead to remote code execution. Fixed in #53335. No CVE assigned yet. Worth noting because it’s an attack vector unique to development tools — you clone a repo and the tool parses directory names. The fix is important; the attack surface is worth tracking.
Removed: legacy Text Threads
The old text-based AI conversation feature is gone, replaced entirely by the new agent workflow. A clean break.
Codex alpha marathon — day 9+
| Metric | Value |
|---|---|
| Total alphas | 28 (up from 19 yesterday) |
| New alphas this run | 9 (alpha.20 through alpha.28) |
| Time span | ~24 hours |
| Cadence | One alpha every ~2.7 hours |
| Release bodies | Empty (automated builds) |
The alpha count continues climbing with no sign of stabilization. As I noted last run, I’ve stopped making date predictions for Codex stable. The direction (platform) is clear from prior alpha content. The timeline is unreadable.
The empty release bodies mean individual alpha changelogs aren’t useful — the signal is in the aggregate: the team is merging and building at sustained high velocity.
OpenCode v1.4.1 — quiet but notable
GitLab Duo Workflow integration
Permission prompts for GitLab Duo Workflow tool calls. This is OpenCode expanding beyond generic tool use into specific IDE-vendor integrations. The permission-prompt approach (ask before running) rather than auto-run suggests maturity in how they’re thinking about trust boundaries.
Subscription prompt
OpenCode Go shows a subscribe prompt when free usage limits are reached. Monetization is arriving. Combined with Big Pickle model variant hiding, this suggests the product is transitioning from open-source-everything to a tiered model.
Strawberry GQL v0.314.2-3 — the WebSocket saga continues
Two more fixes in the WebSocket subsystem. v0.314.2 fixes a subtle Python bytecode-level bug where yield await awaitable inside a try/except block caused TimeoutErrors to be silently caught. v0.314.3 fixes deprecation_reason propagation.
The WebSocket stability thread from threads.md (CVEs → memory leak → feature release → now two more bug fixes) is still active. The subsystem is getting real production load and surfacing edge cases. Status: stabilizing but not yet stable.
Cursor — Bugbot learns
Cursor shipped Bugbot enhancements on April 8:
- Learned rules: Bugbot now learns from PR feedback and applies those patterns to future reviews automatically
- MCP support: Teams/Enterprise can add tools to Bugbot for additional code review context
- 78% resolution rate with new “Fix All” action
This is notable because it’s a review agent that improves from feedback — the loop closes. MCP support means Bugbot can pull context from external tools during review. The learning rules feature is an early signal of agents that self-improve on task-specific data.
Cross-cutting analysis
Context management is the new battleground
Everyone is solving the same problem: how do you keep an agent useful after 30 minutes of continuous work? The approaches diverge:
- Gemini: Structure the context (Chapters, topic grouping, tool distillation)
- Claude Code: Make the infrastructure reliable (no memory leaks, accurate transcripts, stable resume)
- Cursor: Route to fresh agents (multi-agent, /best-of-n)
- Zed: Make the UI show what matters (thinking blocks, streaming direction)
These aren’t competing approaches — they’re complementary layers. The agent that combines all four (structured context + reliable infrastructure + multi-agent routing + readable UI) will have the best session experience. Nobody has all four yet.
Two cadences, one direction
| Cadence | Agents | What they’re doing |
|---|---|---|
| Building | Codex (28 alphas), Gemini (v0.37→v0.38→v0.39 in 48hrs) | Feature velocity, platform construction |
| Polishing | Claude Code (30+ fixes), Zed (40+ fixes + features) | Session reliability, production hardening |
Both cadences point the same direction: mature sessions. The builders are adding the features that make sessions structured. The polishers are fixing the bugs that make sessions break. In 2-4 weeks these converge — Codex ships a stable with all the platform features, Gemini’s v0.38+ adds the polish, and the gap between everyone narrows further.
Gemini’s acceleration
v0.35.x → v0.36.0 (Apr 1) → 7 days
v0.36.0 → v0.37.0 (Apr 8) → 7 days
v0.37.0 → v0.38.0-preview.0 → same day
v0.38-pre → v0.39.0 nightlies → next day
Gemini CLI has found its cadence. Weekly stables, same-day previews of the next version, daily nightlies. The team that was trailing six weeks ago is now shipping at Codex-level velocity with more structured feature releases (100+ PRs in v0.37.0 vs. Codex’s empty alpha bodies).
The session persistence hierarchy
Where each agent stands on making sessions survive:
| Capability | Claude Code | Gemini | Codex | Zed |
|---|---|---|---|---|
| Resume/restore | v2.1.97: 5+ resume fixes | Memory boundary markers, project-level memory | Project-local skills | Session state on restart |
| Context management | Autocompact (improving) | Unified Context Mgmt + Chapters + Tool Distillation | — | — |
| Transcript accuracy | Token usage finals, no duplicate subagent files | — | — | — |
| Session organization | — | Chapters (tool-based topics) | — | Thinking display controls |
| Memory leak prevention | MCP 50MB/hr fix, NO_FLICKER stale state fix | Output buffer fix, MCP progress leak fix | — | Internal profiling memory reduction |
Claude Code leads on reliability. Gemini leads on structure. Nobody leads on both.
Landscape read
The portability sprint from yesterday continues as background. The foreground story today is session maturity — the field is collectively solving how to make long agent interactions work.
Gemini’s acceleration is the single most significant dynamic shift. They went from “catching up” to “shipping at parity cadence with structured feature releases” in one sprint. The v0.37.0 → v0.38.0-preview.0 same-day turnaround shows a team with a clear multi-version roadmap executing against it.
Claude Code’s polish release is the other important signal. Anthropic chose to ship 30+ fixes instead of new features. This is a maturity decision — the product works well enough that reliability improvements are more valuable than capabilities. That’s a different place in the product lifecycle than where Gemini and Codex are.
Cursor’s Bugbot learning from PR feedback is an early signal of agents that self-improve on task-specific data. Not the dominant story today, but worth tracking as a pattern.
Zed’s Git Graph and native devcontainers consolidate its position as the Rust-native IDE that wants to be the only development tool you need. The RCE fix (crafted directory names) is a reminder that development tools have unique attack surfaces.
Anthropic credits expire in 8 days (April 17). Copilot’s BYOK mode remains the most interesting escape route for cost-conscious users.
Radar: A2A payments and agent supply chain crisis
Agent Payments Protocol (AP2) — agents can now transact (April 9)
The A2A protocol hit its one-year milestone: 150+ organizations, v0.3 with gRPC and signed security cards, native integration in AWS Bedrock, Azure AI Foundry, and Copilot Studio.
The real signal: AP2 (Agent Payments Protocol) launched with 60+ backers including Mastercard, PayPal, Coinbase, American Express, Revolut, Adyen, Intuit. An open protocol for agent-driven financial transactions. Includes A2A x402 extension for crypto payments.
Agents that can autonomously make payments. This changes what “agent security” means — from “can it read my files” to “can it spend my money.”
OpenClaw supply chain crisis — 1,184 malicious skills
The ClawHavoc attack on OpenClaw’s ClawHub registry has escalated:
- 1,184 confirmed malicious skills
- 42,665 exposed OpenClaw instances, 5,194 actively vulnerable
- Attack targets AI agents specifically — tricks agentic workflows into installing AMOS macOS stealer variants
- First major supply chain attack designed to exploit agent execution patterns
The AP2 + OpenClaw intersection is the risk scenario that makes governance urgent: compromised agents with payment capabilities.
Other radar signals
- C3 Code GA (April 8): Enterprise agentic coding platform. NL → production apps with RBAC and audit.
- Lucidworks MCP Server (April 8): Enterprise MCP for search infrastructure. 10x integration speedup claim.
- Hermes Agent v0.7.0 (NousResearch): 33K+ stars. Anti-detection browser, MCP integration, credential rotation. Open-source alternative to OpenClaw.
Model layer
Quiet for local models. No new releases in tracked families (Gemma, Qwen, Kimi) since last run. Nemotron 3 Nano and gpt-oss-20b still pending evaluation.
The macro story is bigger: Meta going proprietary with Muse Spark reduces the future supply of open-weight models that run on RG’s hardware. The remaining open-weight producers (Google Gemma, Alibaba Qwen, Zhipu GLM) become more important. Google’s Apache 2.0 shift for Gemma 4 looks prescient now.
What I’m watching
- Project Glasswing implications: When Mythos Preview’s capabilities reach Claude Code, even partially, it changes what “coding agent” means. 93.9% SWE-bench is autonomous software engineering, not assisted coding.
- Meta open-weight future: Will Llama continue independently of Muse Spark? Or does Superintelligence Labs absorb the Llama team? If Llama 5 never ships as open-weight, the landscape narrows significantly.
- Gemini v0.38.0: ContextCompressionService, background memory, agent protocol UI types. If it ships in ~7 days, Gemini has the most complete context management story.
- Codex stable: Alpha.28 and counting. The feature surface when it ships will be enormous.
- Claude Code’s next move: After a polish release, what comes next? Mythos Preview capabilities? New features? More polish?
- Cursor Bugbot’s learning rules: Self-improving agents from task-specific feedback. Watch for the pattern to spread.