Coding agent landscape
Living document. Rewritten as the terrain shifts. Last updated: 2026-06-05.
Update 2026-06-05 — the protocol already had an author
Correction to the Jun-4 entry below: ACP is Zed’s, not Cognition’s. Agent Client Protocol is Zed Industries’ open Apache-licensed standard (~mid-2025; JetBrains × Zed collab Oct 2025; live ACP Registry). Cognition’s Devin Desktop is an adopter, one host among several (Zed, JetBrains, Kiro). Guests already include Claude Agent, Codex, Gemini CLI, Pi, OpenCode, Vibe. The “editor-side mirror of MCP” read is right; the authorship was not. What surfaced it: Vibe v2.14.0 bumped agent-client-protocol to 0.10.1 (+ session/delete over ACP); OpenCode v1.16.0 restored ACP session replay + fixed ACP cancel — the maintenance signature of a live multi-vendor protocol. The reframed strategic read: ACP is winning the editor↔agent slot the way MCP won app↔tool; the labs’ agents are interchangeable guests, and the durable lab move is owning a host surface. Two other moves today: (1) the session became a portable object across OpenCode (move/clone/replay sessions), Claude Code (bg sessions survive version upgrades), and Vibe (session delete); (2) CC vs OpenCode diverge on grain — CC enterprise-governed (requiredMin/MaximumVersion version-pinning), OpenCode provider-neutral (OpenAI-via-Bedrock, SAP AI Core, OpenRouter, 10 community contributors). See reports/2026-06-05-the-protocol-already-had-an-author.md.
Update 2026-06-04 — the fleet becomes an operations surface
A 48-hour convergence across three vendors on the operability layer, not capability. Claude Code v2.1.162 (Jun 3) and Codex rust-v0.137.0 (Jun 4) both spent the cycle on watch / bound / persist for fleets that already run: Claude Code debugged claude agents into usability (column widths, attach-bounce, 5s stall, lost backgrounded sessions, queued failed replies, waitingFor in --json) and closed three permission-rule bugs where deny silently didn’t apply (WebFetch deny < preapproved-host; Windows backslash rules; Read deny not hiding from Glob/Grep); Codex matured MultiAgent-v2 (per-thread runtime, followup_task, hide_spawn_agent_metadata default), added remote-control client RPCs (pairing/revoke grants), keyed permission grants by environment identity, and shipped enterprise credit-limit visibility + cloud-managed config bundles + rollout compression + a skills-extension scaffold.
New entrant — Devin Desktop (Cognition, Jun 2). Windsurf rebranded; “a full IDE with an agent manager built in — not the other way around.” Default surface is the Agent Command Center — a Kanban of every agent, local and cloud, by status. Devin Local: Cascade rewritten from scratch in Rust, ~30% more token-efficient, subagents. ACP (Agent Client Protocol): open, agent-neutral host protocol; Codex / Claude Agent / OpenCode run as guests. This is the editor-side mirror of MCP — ⚠️ but ACP is Zed’s standard, not Cognition’s (see Jun-5 correction above); Devin Desktop is an adopter, not the author. Differentiation has migrated: harness features clone in ~13 days, model capability is a training-run problem, and now operability (fleet dashboards, environment-scoped permits, neutral host protocols) is the live front. See reports/2026-06-04-the-command-center.md.
Update 2026-05-30 — orchestration moves into the model
Opus 4.8 (May 28, 41 days after 4.7). Agentic-coding SWE-Bench Pro 64.3%→69.2%, knowledge-work Elo 1753→1890, computer use ~84%, first model >10% on Legal Agent all-pass. Regular pricing unchanged ($5/$25); fast mode $10/$50 at 2.5× speed / 3× cheaper. Dynamic Workflows (research preview in Claude Code) makes “plan + hundreds of parallel subagents in one session” a native model capability — the harness Workflow tool descending into the weights. Headline: 4× less likely than 4.7 to let flaws in its own code pass unremarked (honesty as the enabler for unattended fleets). The differentiation axis is migrating from harness features (clonable in ~13 days) to model capability (a training-run problem). Gemini 3.5 (SubagentProtocol) and Codex (MultiAgentV2) are on the same trajectory. See reports/2026-05-30-autonomy-descends-into-the-weights.md.
The field as of May 1
The lifecycle gap becomes explicit engineering. Codex v0.128.0 resolves the version-jump silence into 190+ PRs — persisted /goal workflows, permission profiles (replacing --full-auto), git-backed memory, external agent session import, marketplace plugins. Claude Code v2.1.126 ships from the opposite end: claude project purge (clean teardown), expanded --dangerously-skip-permissions, gateway model picker, OAuth paste for WSL2/SSH/containers. Both agents now treat “what happens between sessions” as a first-class engineering surface. Six CLI agents in the field. Codex v0.129.0-alpha.1 (empty) shipped same day as v0.128.0 stable — pipeline didn’t pause.
The benchmark surface continues. Now with DeepSeek V4 and Nate’s execution gap report:
- Claude Opus 4.7 wins coding (SWE-Bench Pro: 64.3%)
- GPT-5.5 wins terminal workflows (Terminal-Bench 2.0: 82.7%), long-context (MRCR v2 at 1M: 74.0%), and practical execution breadth (Nate: “87 where the next best scored 67”)
- DeepSeek V4-Pro wins open-weight overall; Flash is 36-107x cheaper than GPT-5.5
- Open models cluster at 58-59% SWE-Bench Pro — 6-point gap to proprietary (smallest ever)
- NVIDIA Nemotron 3 Nano Omni: 30B/3B-active open-weight multimodal (GUI use, audio, video, documents)
Five competitive strategies:
- Protocol-mediated memory + lifecycle cleanup (Claude Code): v2.1.126 —
claude project purge, expanded permissions, gateway model picker, OAuth paste. Lifecycle management from the cleanup end. - Write-time memory (Gemini CLI): v0.40.0 stable — prompt-driven memory editing, skill extraction, MCP resources, bundled ripgrep. Massive changelog.
- Persistence platform + GPT-5.5 (Codex): v0.128.0 (190+ PRs). Persisted
/goalworkflows, permission profiles, git-backed memory, external agent session import, marketplace plugins. Deprecated--full-auto. The version jump explained. - Parallel agents in GUI (Cursor): v3.2 (Apr 24). /multitask + worktrees + multi-root.
- Purpose-built coding models (Poolside): Laguna XS.2 (33B/3B active, Apache 2.0, 68.2% SWE-Bench).
poolterminal agent. New entrant. - Merged flagship + remote agents (Mistral): Medium 3.5 (128B dense, 77.6% SWE-Bench). Vibe remote agents.
- Depth (Bun/mise/aube): aube v1.5.0 —
aube queryfor dependency graph queries. Eight releases in seven days.
Economics: ChatGPT Plus 44M→9M subscribers (Zitron/The Information). Data centers at 16.7% gross margin. OpenAI needs $852B by 2030. But GPT-5.5 efficiency (+40% fewer tokens/task) and Nate’s execution gap suggest enterprise willingness-to-pay may compensate.
Current positions
| Agent | Latest | Architecture bet | Interface model | Enterprise readiness |
|---|---|---|---|---|
| Claude Code | v2.1.146 (May 21). /simplify → /code-review with effort levels. MCP resources/prompts pagination fix (extends v2.1.144 tools/list fix). Background session reliability (14 fixes). Previous: v2.1.145 (May 19): permission bypass fix. Stainless acquired (May 18): SDK/MCP tooling now in-house. Code with Claude London (May 20-21). | Hook-based extension + MCP as memory bridge + fail-closed enterprise + Channels + re-entry + fullscreen TUI + design pipeline + creative tool connectors + project lifecycle + effort-aware hooks + admin settings merge + practice verticals (legal, financial, small business) + Stainless SDK pipeline | Single-agent deep + agent-addressable slash commands + multi-agent orchestration + Routines + /goal persistence + agent view (fleet visibility) + background session /resume | Highest — admin-tier settings, worktree control, effort hooks, practice verticals, three distribution channels, KPMG 276K employees, but unpatched credential exfil chain |
| Codex | v0.133.0-alpha.4 (May 21). Empty alphas continuing — marathon day 3 since v0.132.0 stable (May 20). Previous: v0.132.0 (goal-as-extension refactoring, Python SDK auth, memory versioning). v0.131.0 (May 18): typed extension API, Python SDK, Profile V2. Codex mobile (May 14): iOS/Android, all plans including Free. | 40+ crate Rust workspace, V8, MCP Apps, WebRTC, plugins, marketplace, multi-cloud (Bedrock + AWS), computer use, persisted goals, permission profiles, mobile supervisor, typed extension API, Python SDK, goal-as-extension | Single-agent + spawn v2 + realtime v2 + parallel background agents + mobile presence + extension platform | High — Bedrock, data residency, approval reviewers, Guardian, devcontainer, permission profiles, OpenAI on AWS, Profile V2 |
| Gemini CLI → Antigravity | CLOSED SOURCE + GO REWRITE confirmed (May 21). Antigravity CLI is proprietary (Gemini CLI was Apache 2.0) and rewritten in Go (was TypeScript/Node). Gemini CLI stops serving consumer requests June 18, 2026. Enterprise customers on Code Assist Standard/Enterprise retain unchanged Gemini CLI. GitHub org: google-antigravity. Three surfaces: CLI + Desktop + SDK. Powered by Gemini 3.5 Flash. Managed Agents via single API call. Previous Gemini CLI features (Skills, Hooks, Subagents) carry over as Antigravity plugins. | Go rewrite, multi-registry subagent, Chapters, UCM, compression + four-tier memory + skill system + Auto Memory inbox (GA), persistent browser, GCP backend, SubagentProtocol, session export/import, desktop app, SDK, Managed Agents API | Single-agent + subagent + memory-as-service + self-improvement (GA) + session portability + desktop + scheduled tasks | Medium-High — Vertex AI routing, admin MCP, governance files, sandbox on all platforms, enterprise agent platform, enterprise tier insulated from migration |
| Cursor | v3.3+ (May 7-13). PR review, “Build in Parallel” (auto-split plans into subagent PRs), split PRs. Bugbot effort levels (May 11): configurable Default/High/Custom. Cursor in Microsoft Teams (May 11): @Cursor in any channel. Dev Environments for Cloud Agents (May 13): multi-repo with Dockerfile config, build secrets, 70% faster builds, rollback, audit logs. Bugbot usage-based billing (effective June 8). Seven enterprise features in ten days. | IDE-integrated, cloud agents, PR lifecycle, cloud dev environments | Multi-agent-native (Agents Window) + parallel plan decomposition + Teams integration | High — model blocklists, spend limits, context diagnostics, audit logs, dev environments, usage-based billing |
| Copilot | BYOK + local + air-gapped. SDK v0.2.1. Studio multi-agent GA. | Cloud-first → now also local. Composable SDK, multi-model broker. | Cloud Agent + CLI + Studio | Strongest — commit signing, org firewalls, agent vuln assignment |
| OpenCode | v1.16.0 (Jun 5). Session portability (move sessions between workspaces/dirs; managed workspace cloning keeping dirty files); skill discovery + file-based agent loading; OpenAI-via-Bedrock, SAP AI Core fixes, OpenRouter bump; Copilot token-billing usage tracking; ACP session replay + cancel fixes; desktop color themes / thinking-level selector / Servers tab; +38% startup. 10 community contributors. | Effect-based, provider-neutral, community-driven, ACP guest | Single-agent | Growing — provider-neutrality + session portability the live front |
| Vibe | v2.14.0 (Jun 4). Image @-attachments for vision models; session deletion exposed over ACP (session/delete); agent-client-protocol bumped to 0.10.1; create-only write_file (refuses overwrite); EnvironmentLayer (VIBE_-prefixed); SKILL.md parse-failure toast; LLM-call retry on network errors. | Mistral-native, hooks, scratchpad, ACP-native, multi-provider | Single-agent + subagent scratchpad | Growing — deep ACP integration, layered config |
| Poolside pool | NEW (Apr 28). Terminal-based coding agent. Powered by Laguna XS.2 (33B/3B MoE, Apache 2.0, 68.2% SWE-Bench) and M.1 (72.5%). Also launched Shimmer (cloud dev). | Purpose-built MoE model, terminal agent | Single-agent | Early — new entrant |
| Aider | v0.86.0 (Aug 2025) — 256 days stalled | Python, model-agnostic | Single-agent | N/A |
| Zed | v1.2.6 (May 16). v1.2.4 (May 15): ChatGPT subscription provider — use ChatGPT Plus/Pro subscription with Zed agent. GPT-5.4 nano/mini. v1.2.3 (May 13): agent edit reliability, Git Graph remote support, security fix (Bash arithmetic injection detection). v1.1.5 (May 6): Business plan, agentic layout mode, Git Graph. Model-agnostic: Zed Pro + Anthropic API + ChatGPT subscriptions. | Rust-native editor + agent panel + Business plan + model-agnostic subscriptions | IDE-integrated + agentic layout mode | Growing — Business plan, org-wide model controls, spend tracking per member, three subscription models |
Key dynamics
Enterprise deployment — the new primary axis (April 11)
Every release this week shipped enterprise features. The competition has shifted from “which agent is smartest” to “which agent deploys into my organization.”
| Feature | Claude Code | Codex | Gemini | Copilot | OpenCode |
|---|---|---|---|---|---|
| Cloud integration | Vertex wizard, Bedrock SigV4 | Remote exec-server | GCP backend | Azure, Cloud Agent | GitLab Duo |
| Legacy SCM | Perforce mode | — | — | Git commit signing | — |
| TLS/proxy | OS CA trust by default | — | — | — | — |
| Team tools | /team-onboarding | — | — | SDK + Studio | — |
| Data sovereignty | — | Residency headers | — | Org firewalls | — |
| Approval workflows | Managed settings | Allowed reviewers, Guardian | Admin MCP | Org firewalls | — |
| Multi-model | Vertex + Bedrock + direct | OpenAI + Amazon Bedrock | Google + Vertex AI routing | 5+ model broker | Claude + GPT + Mistral + Kimi |
| Offline | — | — | gemini gemma local setup | BYOK + Ollama | — |
The Codex alpha marathon — resolved
33 alphas → v0.119.0 stable (Apr 10) → v0.120.0 stable (Apr 11).
What the alphas contained: WebRTC realtime v2, MCP Apps (part 1 + 2), codex-core crate extraction into 8+ independent crates, remote exec-server, multi-agent v2 (path-based addressing), background agent streaming.
What this means: Codex is no longer a CLI. It’s a modular platform with independently-embeddable crates, a real-time voice system, a plugin ecosystem, and a remote execution model. The crate extraction is the architectural signal — you split monoliths when you expect multiple consumers.
Session maturity — re-entry layer landing (UPDATED April 15)
The five approaches now share a sub-layer: what the agent does when you come back.
| Approach | Agent | Status |
|---|---|---|
| Structure the context | Gemini | Shipped (Chapters, UCM, Tool Distillation). v0.38.0 stable: ContextCompressionService, background memory service for skill extraction, auto-configure memory, subagent workspace scoping via AsyncLocalStorage. Memory is now a subsystem with multiple services. |
| Make infrastructure reliable | Claude Code | Continuing. v2.1.108: /recap, prompt-cache TTL (1h/5m), Skill tool invokes slash commands. v2.1.110: fullscreen TUI, distributed tracing (TRACEPARENT/TRACESTATE), session recap for telemetry-disabled users, --resume resurrects scheduled tasks. The session is becoming a terminal application, not just a prompt. |
| Route to fresh agents | Cursor | v3.1 tiled parallel agents, Bugbot learning. |
| Make UI show what matters | Zed | v0.232.2: focus-follows-mouse, order-independent file finder, markdown preview search, Bedrock model expansion, dev container maturity (8 fixes + CLI flag). The editor becomes a development environment. |
| Stream background work | Codex | v0.121.0: marketplace, memory lifecycle (mode/reset/delete/extension cleanup), MCP Apps P3, secure devcontainer, codex-thread-store crate. The platform ships again — distribution, persistence, extensibility, isolation all in one release. |
Cross-cutting theme (new): Re-entry as a layer. Gemini and Claude Code shipped parallel solutions to the same problem this cycle — compress the context, recap the session, expose cache TTL, make discontinuity visible. Convergence this sharp usually precedes either a standard or a moat. Watch for MCP session-memory extension or a vendor opening the compression API.
Platform architecture comparison (updated April 11)
| Platform capability | Codex | Claude Code | Gemini | Copilot |
|---|---|---|---|---|
| Context management | — | Autocompact + transcript fixes | UCM + Chapters + Tool Distillation | — |
| Remote access | Remote exec-server + WebRTC | Channels (Telegram/Discord) | — | Cloud Agent |
| App ecosystem | MCP Apps P1+P2 | Hook-based plugins + skills | Multi-registry subagent | SDK + Studio |
| Multi-agent protocol | Spawn v2 (path-addressing) | — | Subagent (event-driven) | Studio multi-agent GA |
| Real-time voice | WebRTC v2 (default) | — | — | — |
| Background agents | Realtime v2 streaming | — | — | — |
| Enterprise compliance | Data residency, approval workflows, Guardian | Fail-closed settings, Vertex, Perforce, CA trust, team onboarding | Admin MCP, governance files | Org firewalls, commit signing |
| Session persistence | Persisted /goal workflows, git-backed memory, external session import | Resume chain recovery, session fixes, project purge | Chapters + project memory | — |
| Local/offline | — | — | — | BYOK + Ollama |
| Sandbox | Allow-list + deny-list + Windows elevated | macOS/Linux sandbox + PID isolation | Dynamic expansion (all platforms) | — |
| Crate modularity | 8+ extracted crates | — | — | — |
Gemini CLI acceleration (UPDATED April 16 — v0.38.1 patch)
v0.37.0 (Apr 8) → v0.37.1 (Apr 9) → v0.37.2 (Apr 13) patch
v0.38.0-preview.0 (Apr 8) → 6 days in limbo → v0.38.0 stable (Apr 14) → v0.38.1 patch (Apr 15)
v0.39.0-nightlies daily → v0.39.0-preview.0 (Apr 14 23:11Z)
v0.40.0-nightlies starting (Apr 15)
v0.38.0 shipped the preview bundle intact after six days in limbo: ContextCompressionService, background memory service (skill extraction), auto-configure memory, subagent workspace scoping, ADK non-interactive agent sessions, context-aware persistent policy approvals, TerminalBuffer mode (flicker), plus UI polish (role-specific /stats, click-to-expand topics, /help and /about for ACP). Gemini now has the most complete context management story: Chapters + UCM + Tool Distillation + ContextCompressionService + background memory.
Cycle compression is visible: v0.39.0-preview.0 shipped 10 minutes before v0.38.0 stable. Preview-to-stable overlap, and nightlies are already on v0.40.0.
The quiet ones
| Tool | Last release | Note |
|---|---|---|
| Aider | Aug 2025 | 8 months. No GitHub activity. |
| Cursor | Apr 15 (v3.1 Canvases) | Most aggressive IDE-agent surface expansion. Tiled layout + canvases in same week. |
Anthropic product surface expansion (UPDATED — April 18)
Anthropic launched Claude Managed Agents (April 8-9): composable API for cloud-hosted agents. YAML/NL definitions, sandboxed execution, checkpointing, persistent sessions, scoped permissions. $0.08/session-hour + token costs. Beta: Notion, Asana, Rakuten, Sentry.
Claude for Word beta (April 10-11): Native Word add-in. Team/Enterprise only. Legal review, financial editing. Cross-app with Excel and PowerPoint add-ins.
Claude Design (April 17): Anthropic Labs research preview. Design tool that reads codebase and design files, builds organizational design system, creates prototypes/slides/one-pagers from conversation. Exports to Canva/PDF/PPTX/HTML + handoff bundle for Claude Code. Pro/Max/Team/Enterprise. Powered by Opus 4.7. CPO Mike Krieger resigned from Figma’s board April 14 (3 days before launch). Figma stock dropped 7%.
Claude for Creative Work (April 28): Connectors for 8 creative platforms — Ableton, Adobe Creative Cloud (50+ tools), Affinity/Canva, Autodesk Fusion, Blender, Resolume, SketchUp, Splice. Not a product launch — integration expansion. Extends the design vertical into professional creative workflows: 3D modeling from conversation, live visual performance, music production documentation, batch automation.
Anthropic’s position: Claude Code (local agent) + Claude Design (design tool) + Claude for Creative Work (8 creative integrations) + Managed Agents (cloud agent) + Conway (consumer workspace) + Claude for Word/Excel/PowerPoint (Office integration) + API. Seven deployment surfaces from one vendor. The pipeline deepens: design in Claude Design → produce in creative tools → implement in Claude Code. No other vendor has this vertical coverage.
Claw Code — open-source Claude Code clone (NEW — April 2)
Clean-room rewrite of Claude Code from March 31 source map leak (512K lines TypeScript). 72K GitHub stars. Python + Rust. 19 permission-gated tools, MCP integration, multi-agent orchestration. Proves the architecture is replicable. Anthropic’s vendor surface control thread gets more complicated — the architecture is now open source whether they wanted it or not.
Usage numbers (updated)
- Copilot: 4.7M paid subscribers, 20M total users, 42% market share, 90% Fortune 100. Claude Opus 4.6 as model option.
- Codex: 2M+ weekly users, 6x growth since January
- Pinterest MCP: 66K invocations/month, 844 users, ~7,000 hours/month saved
- Cursor Bugbot: 78% resolution rate
- Perplexity: $450M+ ARR after agent pivot (+50% in one month)
- MCP ecosystem: 8,600+ servers
- Gartner: 40% of enterprise apps include AI agents by end 2026 (up from <5% in 2025)
Credits expired — silence held (RESOLVED April 17)
Anthropic credits expired April 17. No vendor positioned. Twenty-eight days of tracking — the mutual silence held through and past the deadline. Codex pricing restructured to three tiers ($20/$100/$200), token-based. Copilot BYOK offered escape route. The silence is the market’s statement: nobody wants to compete on price because they all need to raise prices. The subsidy era is ending industry-wide.
Mythos → regulatory pressure (NEW — April 12)
Treasury Secretary and Fed Chair summoned bank CEOs over Mythos cyber risk. Model capability as financial-sector systemic risk. Enterprise deployment features shift from competitive advantage to compliance requirement. Agents that can’t demonstrate security compliance will be excluded from regulated industries. Claude Code’s security hardening arc (v2.1.96-101) looks prescient — Anthropic was hardening the agent before the government responded to the model.