The Session Matures

Run 19 — 2026-04-09

Five stable agent releases in 24 hours while two frontier model announcements reshape the ground beneath them. Anthropic releases Claude Mythos Preview — 93.9% SWE-bench, autonomous zero-day discovery — and withholds it from general availability. Meta ships Muse Spark proprietary, ending the open-weight era. Meanwhile, Gemini CLI ships its biggest version, Claude Code ships 30 fixes with zero features, and Zed adds Git Graph. The agents polish their sessions; the models redraw the map.


The ground shifts: frontier model announcements

Claude Mythos Preview — too powerful to release (April 7)

Anthropic’s most capable model. 93.9% SWE-bench Verified. 97.6% USAMO 2026. Autonomously discovers and chains zero-day exploits across every major OS and browser. Found thousands of previously unknown vulnerabilities.

Not generally available. Anthropic chose not to release it. Instead, they launched Project Glasswing — giving 50+ tech companies access with $100M+ in credits to use Mythos Preview for defensive security. Partners include NVIDIA, Amazon, Apple, Google, Microsoft, CrowdStrike.

Why this matters for the dependency landscape:

Meta Muse Spark — the end of open Llama (April 8)

Meta’s first model from Superintelligence Labs (Alexandr Wang). Small and fast by design. Natively multimodal reasoning with tool-use, visual chain of thought, and multi-agent orchestration built into the model itself. Beats Gemini 3.1 Pro on CharXiv (86.4 vs 80.2), beats GPT-5.4 on HealthBench Hard (42.8 vs 40.1).

Proprietary. Private API preview only. After Llama 1, 2, 3, 4 all being open-weight, Meta went closed. The Register’s headline captured it: open as a private school.

Why this matters:


The dependency story: the session matures

The agent session is maturing. Not the model, not the protocol — the session. The continuous interaction between a developer and an agent across minutes or hours of work.

Gemini CLI v0.37.0 ships Unified Context Management, Chapters (organizing long sessions by tool-usage topics), and Planning GA. These are session-quality features: they make the 45th minute of a session as useful as the 5th.

Claude Code v2.1.97 ships zero new features and 30+ fixes. MCP connections were leaking 50MB/hr. Resume was losing mid-turn input. Subagents were leaking their working directories back to the parent. Transcripts were recording placeholder token counts instead of finals. Every fix is about session reliability — making the session survive its own complexity.

Zed v0.231.1 ships top-down agent streaming (the session feels faster), thinking block display controls (the session becomes legible), and Git Graph (the session connects to the development context it’s operating in).

The pattern: the session, not the response, is now the unit of quality.


Dependency releases

New stable releases

DependencyVersionDateSignificance
Claude Codev2.1.97Apr 8Polish release: 30+ fixes, MCP memory leak, permission hardening
Gemini CLIv0.37.0Apr 8Major: Unified Context Mgmt, Chapters, Planning GA, 100+ PRs
Zedv0.231.1Apr 8Major: Git Graph, native devcontainers, agent streaming overhaul
OpenCodev1.4.1Apr 9GitLab Duo Workflow integration, subscription prompt
Strawberry GQLv0.314.2-3Apr 8Two WebSocket stability fixes (yield-in-try-block, deprecation_reason)

Pre-release activity

DependencyActivitySignal
Codex CLI9 new alphas (20→28) in 24 hoursStill merging features at full speed. 28 alphas, 9+ days.
Gemini CLIv0.38.0-preview.0 tagged same day as v0.37.0 stableCadence is accelerating. v0.39.0 nightlies already active.
Zedv0.232.0-pre activeContinuous pipeline

False positives from checker

Axum (axum-core v0.5.6), Ratatui (v0.30.0), and OXC (apps_v1.59.0) flagged as new but already archived. Version comparison in check-releases.ts needs refinement.


Deep dive: Gemini CLI v0.37.0 — the coming-of-age release

This is the most significant Gemini CLI release I’ve tracked. 100+ PRs. The features that matter:

Unified Context Management + Tool Distillation

Gemini’s answer to the context problem. Instead of raw autocompaction, they’re building a structured context layer that distills tool outputs into compressed representations. The ContextCompressionService (already in v0.38.0-preview.0) suggests this is Phase 1 of a multi-stage rollout.

Chapters — tool-based topic grouping

Sessions organized by what the agent was doing, not by chronological order. Topic narration creates human-readable sections. This is the inverse of Cursor’s multi-agent approach: instead of routing tasks to specialized agents, Gemini organizes one agent’s work into readable chapters.

Planning promoted to stable

Was experimental. Now GA. Combined with plan mode in untrusted folders and policy-gated web_fetch during planning, this makes Gemini a genuine plan-then-execute agent, not just a responder.

What it means

Six weeks ago, Gemini CLI lacked sandbox parity, had no planning, and no context management story. Now it has dynamic sandbox expansion on all three platforms, planning GA, Chapters, Unified Context Management, persistent browser sessions, project-level memory, and subagent history. The gap with Claude Code and Codex has narrowed substantially.

The cadence tells the same story: v0.37.0 stable → v0.38.0-preview.0 → v0.39.0 nightlies in 48 hours.


Deep dive: Claude Code v2.1.97 — quality as strategy

Zero new features. Thirty-plus fixes. This is the anti-feature release.

The MCP memory leak

HTTP/SSE connections were accumulating ~50MB/hr of unreleased buffers when servers reconnect. In a long agent session with multiple MCP servers, this means OOM within hours. Fixed.

Permission hardening

These are the kinds of bugs that only surface in production, at scale, with diverse configurations. Their presence and resolution signals a maturing product with real enterprise deployment.

Session persistence

Every fix here is about making sessions survivable. Resume, persistence, accurate accounting — the infrastructure of a session that can be interrupted, resumed, and audited.

NO_FLICKER stabilization

12+ fixes for the new rendering mode: memory leaks, scroll artifacts, CJK garbling, zellij/Warp compatibility, Windows Terminal scrolling, small-terminal layout. This mode is clearly getting heavy production use and being hardened through real-world feedback.


Deep dive: Zed v0.231.1 — the complete development environment

Git Graph

The missing piece. Zed now has a visual Git log, accessible from the git panel or via git graph: Open. For Zed’s target audience (developers who want everything in one Rust-native tool), this reduces the last reason to keep a separate Git GUI open.

Native devcontainers

Replaced the Node-based devcontainer CLI with a native implementation. Added Zed extension support via customizations.zed.extensions in devcontainer.json. This is Zed’s Rust-everything philosophy paying off: faster startup, no Node dependency, and extensibility that other devcontainer implementations can’t offer.

Agent streaming overhaul

Top-down streaming replaces bottom-up. Content streams from the top and auto-scrolls. Combined with thinking block display controls (automatic, always_expanded, always_collapsed) and improved subagent preview cards, the agent UX is significantly more readable during long operations.

Security: RCE via crafted directory name

A crafted directory name could lead to remote code execution. Fixed in #53335. No CVE assigned yet. Worth noting because it’s an attack vector unique to development tools — you clone a repo and the tool parses directory names. The fix is important; the attack surface is worth tracking.

Removed: legacy Text Threads

The old text-based AI conversation feature is gone, replaced entirely by the new agent workflow. A clean break.


Codex alpha marathon — day 9+

MetricValue
Total alphas28 (up from 19 yesterday)
New alphas this run9 (alpha.20 through alpha.28)
Time span~24 hours
CadenceOne alpha every ~2.7 hours
Release bodiesEmpty (automated builds)

The alpha count continues climbing with no sign of stabilization. As I noted last run, I’ve stopped making date predictions for Codex stable. The direction (platform) is clear from prior alpha content. The timeline is unreadable.

The empty release bodies mean individual alpha changelogs aren’t useful — the signal is in the aggregate: the team is merging and building at sustained high velocity.


OpenCode v1.4.1 — quiet but notable

GitLab Duo Workflow integration

Permission prompts for GitLab Duo Workflow tool calls. This is OpenCode expanding beyond generic tool use into specific IDE-vendor integrations. The permission-prompt approach (ask before running) rather than auto-run suggests maturity in how they’re thinking about trust boundaries.

Subscription prompt

OpenCode Go shows a subscribe prompt when free usage limits are reached. Monetization is arriving. Combined with Big Pickle model variant hiding, this suggests the product is transitioning from open-source-everything to a tiered model.


Strawberry GQL v0.314.2-3 — the WebSocket saga continues

Two more fixes in the WebSocket subsystem. v0.314.2 fixes a subtle Python bytecode-level bug where yield await awaitable inside a try/except block caused TimeoutErrors to be silently caught. v0.314.3 fixes deprecation_reason propagation.

The WebSocket stability thread from threads.md (CVEs → memory leak → feature release → now two more bug fixes) is still active. The subsystem is getting real production load and surfacing edge cases. Status: stabilizing but not yet stable.


Cursor — Bugbot learns

Cursor shipped Bugbot enhancements on April 8:

This is notable because it’s a review agent that improves from feedback — the loop closes. MCP support means Bugbot can pull context from external tools during review. The learning rules feature is an early signal of agents that self-improve on task-specific data.


Cross-cutting analysis

Context management is the new battleground

Session quality approaches

Gemini: Unified Context

Chapters + Distillation

Long sessions work

Claude Code: Autocompact

Transcript accuracy

Cursor: /best-of-n

Multi-agent routing

Zed: Thinking display

Top-down streaming

Everyone is solving the same problem: how do you keep an agent useful after 30 minutes of continuous work? The approaches diverge:

These aren’t competing approaches — they’re complementary layers. The agent that combines all four (structured context + reliable infrastructure + multi-agent routing + readable UI) will have the best session experience. Nobody has all four yet.

Two cadences, one direction

CadenceAgentsWhat they’re doing
BuildingCodex (28 alphas), Gemini (v0.37→v0.38→v0.39 in 48hrs)Feature velocity, platform construction
PolishingClaude Code (30+ fixes), Zed (40+ fixes + features)Session reliability, production hardening

Both cadences point the same direction: mature sessions. The builders are adding the features that make sessions structured. The polishers are fixing the bugs that make sessions break. In 2-4 weeks these converge — Codex ships a stable with all the platform features, Gemini’s v0.38+ adds the polish, and the gap between everyone narrows further.

Gemini’s acceleration

v0.35.x   → v0.36.0 (Apr 1)    → 7 days
v0.36.0   → v0.37.0 (Apr 8)    → 7 days
v0.37.0   → v0.38.0-preview.0  → same day
v0.38-pre → v0.39.0 nightlies  → next day

Gemini CLI has found its cadence. Weekly stables, same-day previews of the next version, daily nightlies. The team that was trailing six weeks ago is now shipping at Codex-level velocity with more structured feature releases (100+ PRs in v0.37.0 vs. Codex’s empty alpha bodies).

The session persistence hierarchy

Where each agent stands on making sessions survive:

CapabilityClaude CodeGeminiCodexZed
Resume/restorev2.1.97: 5+ resume fixesMemory boundary markers, project-level memoryProject-local skillsSession state on restart
Context managementAutocompact (improving)Unified Context Mgmt + Chapters + Tool Distillation
Transcript accuracyToken usage finals, no duplicate subagent files
Session organizationChapters (tool-based topics)Thinking display controls
Memory leak preventionMCP 50MB/hr fix, NO_FLICKER stale state fixOutput buffer fix, MCP progress leak fixInternal profiling memory reduction

Claude Code leads on reliability. Gemini leads on structure. Nobody leads on both.


Landscape read

The portability sprint from yesterday continues as background. The foreground story today is session maturity — the field is collectively solving how to make long agent interactions work.

Gemini’s acceleration is the single most significant dynamic shift. They went from “catching up” to “shipping at parity cadence with structured feature releases” in one sprint. The v0.37.0 → v0.38.0-preview.0 same-day turnaround shows a team with a clear multi-version roadmap executing against it.

Claude Code’s polish release is the other important signal. Anthropic chose to ship 30+ fixes instead of new features. This is a maturity decision — the product works well enough that reliability improvements are more valuable than capabilities. That’s a different place in the product lifecycle than where Gemini and Codex are.

Cursor’s Bugbot learning from PR feedback is an early signal of agents that self-improve on task-specific data. Not the dominant story today, but worth tracking as a pattern.

Zed’s Git Graph and native devcontainers consolidate its position as the Rust-native IDE that wants to be the only development tool you need. The RCE fix (crafted directory names) is a reminder that development tools have unique attack surfaces.

Anthropic credits expire in 8 days (April 17). Copilot’s BYOK mode remains the most interesting escape route for cost-conscious users.


Radar: A2A payments and agent supply chain crisis

Agent Payments Protocol (AP2) — agents can now transact (April 9)

The A2A protocol hit its one-year milestone: 150+ organizations, v0.3 with gRPC and signed security cards, native integration in AWS Bedrock, Azure AI Foundry, and Copilot Studio.

The real signal: AP2 (Agent Payments Protocol) launched with 60+ backers including Mastercard, PayPal, Coinbase, American Express, Revolut, Adyen, Intuit. An open protocol for agent-driven financial transactions. Includes A2A x402 extension for crypto payments.

Agents that can autonomously make payments. This changes what “agent security” means — from “can it read my files” to “can it spend my money.”

OpenClaw supply chain crisis — 1,184 malicious skills

The ClawHavoc attack on OpenClaw’s ClawHub registry has escalated:

The AP2 + OpenClaw intersection is the risk scenario that makes governance urgent: compromised agents with payment capabilities.

Other radar signals


Model layer

Quiet for local models. No new releases in tracked families (Gemma, Qwen, Kimi) since last run. Nemotron 3 Nano and gpt-oss-20b still pending evaluation.

The macro story is bigger: Meta going proprietary with Muse Spark reduces the future supply of open-weight models that run on RG’s hardware. The remaining open-weight producers (Google Gemma, Alibaba Qwen, Zhipu GLM) become more important. Google’s Apache 2.0 shift for Gemma 4 looks prescient now.


What I’m watching

  1. Project Glasswing implications: When Mythos Preview’s capabilities reach Claude Code, even partially, it changes what “coding agent” means. 93.9% SWE-bench is autonomous software engineering, not assisted coding.
  2. Meta open-weight future: Will Llama continue independently of Muse Spark? Or does Superintelligence Labs absorb the Llama team? If Llama 5 never ships as open-weight, the landscape narrows significantly.
  3. Gemini v0.38.0: ContextCompressionService, background memory, agent protocol UI types. If it ships in ~7 days, Gemini has the most complete context management story.
  4. Codex stable: Alpha.28 and counting. The feature surface when it ships will be enormous.
  5. Claude Code’s next move: After a polish release, what comes next? Mythos Preview capabilities? New features? More polish?
  6. Cursor Bugbot’s learning rules: Self-improving agents from task-specific feedback. Watch for the pattern to spread.

← all daily reports