Agentic engineering patterns
Living document. Rewritten as the field evolves. Last updated: 2026-06-05.
Technology radar
Adopt — proven, use now
| Pattern/Tool | Evidence |
|---|---|
| MCP as standard protocol | Every major agent supports it. Universal adapter. Codex MCP Apps P1+P2. Pinterest: 66K invocations/month in production. 8,600+ servers. SurePath AI shipping MCP-specific governance. |
| Spec-driven development workflow | GitHub Spec Kit (84.7K stars), AWS Kiro, 30+ frameworks mapped. Delta Airlines: 1,948% growth in AI tool adoption using specs. |
| Plan-before-act | All four major CLI agents have it. Table stakes. Differentiation moved to multi-agent orchestration. |
| Sandbox-first execution | Universal across CLI agents. Gemini adding dynamic expansion (Windows/Linux). Codex adding deny-list mode alongside allow-list. |
| Git worktree-based parallel agent execution | Cursor (8 parallel), Claude Code (16+), Windsurf (5), Grok Build (8), OMX wrapping Codex. Gemini v0.37.0 preview adds worktree support. |
Agentic workflow tooling (mise-versions ecosystem)
CLI tools that multiply agent effectiveness. Source: mise-versions.jdx.dev.
Core — agents use these directly:
| Tool | What it does | Why agents benefit |
|---|---|---|
| ripgrep (rg) | Fast recursive search | Claude Code uses it internally. Faster search = faster agent turns. |
| fd | Fast file finder | Replaces find. Used by agents for file discovery. |
| jq | JSON processor | Agents manipulate JSON constantly — API responses, config, MCP output. |
| shellcheck | Shell script linter | Catches agent-generated bash errors before they execute. |
| hk | Git hooks manager | Structures agent commit workflows. Lighter than lefthook/husky. |
| fnox | Encrypted secret manager | Safe credential handling in agent environments. |
Quality-of-life — improves the human-agent collaboration:
| Tool | What it does | Why it helps |
|---|---|---|
| bat | Better cat with syntax highlighting | Agent-generated file output readable in terminal. |
| delta | Better git diff | Clearer diffs when reviewing agent code changes. |
| fzf | Fuzzy finder | Interactive selection in agentic workflows. |
| eza | Better ls | Tree views of agent-modified directories. |
| zoxide | Smart cd | Fast navigation between agent worktrees. |
mise itself is the substrate — reproducible environments via mise.toml, task runner replacing Makefiles, per-project env vars. 400+ tools in the registry.
Trial — working in production, still evolving
| Pattern/Tool | Evidence |
|---|---|
| Platform-level agent orchestration | Codex v0.119.0 ships the platform: MCP Apps, WebRTC realtime, 8+ extracted crates, remote exec-server. Gemini has GCP backend. Scion cross-vendor. Codex v0.120.0 adds background agent streaming. |
| Enterprise deployment as competitive axis | Every agent shipped enterprise features Apr 8-11: Claude Code (Vertex wizard, Perforce, CA trust, team onboarding), Codex (residency, approval workflows), OpenCode (OAuth MCP, fast mode multi-model). Maps to Nate’s “five durable layers” thesis. |
| Agent governance tooling | Microsoft Governance Toolkit: OWASP MCP Top 10, SOC 2 mapping, tool injection scanning. SurePath AI: MCP-specific runtime policy engine. Pinterest at 66K/month proves the governance need. |
| Agent portability / BYOK | Copilot CLI: BYOK + Ollama/vLLM + air-gapped. Scion: vendor-agnostic orchestration. Dependabot: multi-vendor agent assignment. The portability sprint. |
| ACP — editor↔agent standard protocol (NEW Jun 5) | Zed’s Agent Client Protocol (Apache, open; JetBrains × Zed collab Oct 2025; live ACP Registry). The editor-side mirror of MCP: hosts (Zed, JetBrains, Kiro, Devin Desktop) ↔ guests (Claude Agent, Codex, Gemini CLI, Pi, OpenCode, Vibe). Live-protocol signature Jun 4–5: Vibe bumped agent-client-protocol to 0.10.1 (+session/delete); OpenCode shipped ACP replay + cancel fixes. Commoditizes the agent into an interchangeable guest; the host owns the user relationship. Trial→Adopt candidate — already multi-vendor. NOT Cognition’s, despite the Jun-2 Devin Desktop launch framing. |
| Portable/replayable session as object (NEW Jun 5) | The session becoming a movable, reconstructable, auditable artifact. OpenCode v1.16.0: move sessions between workspaces/dirs, clone keeping dirty files, full replay on load. Claude Code v2.1.163: bg sessions update in place keeping running tasks across version upgrades. Vibe v2.14.0: session deletion as first-class ACP method. Replay = auditability = the precondition for trusting unattended fleets. The layer beneath the fleet cockpit. |
| Composable agent SDKs | Copilot SDK v0.2.1: cross-language commands + UI elicitation (JS/TS, Python, Go, .NET). BYOK, W3C tracing. |
| GitHub Spec Kit | Open-source spec-driven scaffolding, 84.7K stars, supports 14+ agent platforms. |
| AWS Kiro | Spec-driven agentic IDE on Bedrock (Claude Sonnet 4.0/3.7). GovCloud available. |
| Agentic harness engineering | Anthropic: “2026 is the year of harnesses.” Same model scores 17 problems apart in different agents. Claude Code’s 512K+ lines prove it. |
| Heterogeneous model routing | Frontier for reasoning, mid-tier for standard tasks, small models for high-frequency execution. Gemini adding dynamic routing for 3.1 Pro/Flash Lite. |
| Hook-based automation | Claude Code’s PreToolUse/PostToolUse/Stop with conditional filtering, defer/resume. Channels and Conway may supplement/replace. |
| Human-at-checkpoints | Agents build full systems autonomously, pausing only for strategic review. Anthropic’s three-agent harness: planner/generator/evaluator. |
| Path-based multi-agent addressing | Codex spawn v2 dropped agent IDs for path-based addressing (/root/agent_a). Agent tree IS the address space. Fire-and-forget messaging + feedback cascade. |
| Session quality as primary battleground (promoted Apr 15, UPDATED Apr 16) | Re-entry stack built (Apr 14-15). Surface expansion follows (Apr 16): Claude Code fullscreen TUI, Codex marketplace + memory lifecycle, Zed focus-follows-mouse + dev containers. The shared plumbing is built; now each tool uses it to differentiate. The session is the unit of quality — and the session is becoming a richer application, not just a prompt. |
| CLI → terminal application shift (NEW Apr 16) | Claude Code /tui fullscreen (flicker-free rendering, scroll control, modal UI). Codex Ctrl+R history search. The CLI agent is becoming a terminal application you inhabit. Not an IDE — a richer terminal. |
| Super-app convergence (UPDATED Apr 19) | Two paths to the same destination in the same 48hr window (Apr 16-17). Anthropic vertical: Opus 4.7 → Claude Code → Claude Design → Managed Agents → Claude for Word → Conway → API. Six surfaces, one model provider. Build all the apps. Codex lateral: “for almost everything” — computer use (background agents on macOS), 90+ plugins, in-app browser, memory. Use all the apps through one interface. Neither vendor acknowledged the other. Counter-thesis: Nate’s BYOC says both approaches are context traps. |
Assess — investigate, understand implications
| Pattern/Tool | Evidence |
|---|---|
| Frontier models as systemic risk | Anthropic’s Mythos (93.9% SWE-bench, autonomous zero-day discovery) triggered Treasury/Fed emergency meeting with bank CEOs (Apr 8). Model capability now treated as financial-sector systemic risk. New deployment pattern: directed use-case access, not open API. Security hardening becomes regulatory, not optional. |
| Multi-agent orchestration in the model | Meta Muse Spark: multi-agent orchestration built into the model itself, not the harness. “Contemplating mode” runs a squad of agents in parallel. Agents-in-the-model vs agents-around-the-model. |
| Open-weight contraction | Meta went proprietary with Muse Spark after Llama 1-4 open-weight. Open-weight now depends on Google (Gemma), Alibaba (Qwen), Zhipu (GLM), and community. Llama’s future unclear. |
| Self-improving review agents | Cursor Bugbot learns from PR feedback, applies learned rules to future reviews. MCP tools for context. 78% resolution rate. Cursor v3.1 (Apr 13): tiled layout for parallel agents, upgraded voice input. Multi-agent parallelism as first-class UX. |
| Agent-addressable slash commands (NEW Apr 15) | Claude Code v2.1.108 (Apr 14): Skill tool can invoke built-in slash commands (/init, /review, /security-review). Blurs human-UX / agent-UX line. Policy question emerging: should org settings restrict which slash commands the agent can invoke on itself? Watch for: other agents’ slash/tool equivalents opening to the model. |
| Agent execution runtimes | Anthropic Managed Agents (April 8-9): YAML definitions, sandboxed execution, persistent sessions, $0.08/session-hour. Beta with Notion, Asana, Rakuten, Sentry. Conway CNW ZIP may be the extension format. Codex: remote exec-server. Gemini: GCP backend. Model providers becoming execution platforms. |
| Persistent agent platforms | Conway CNW ZIP details via Nate’s analysis: standalone workspace, webhook activation, browser control, proprietary extension format. Channels (shipped). Codex: remote control + WebRTC. Gemini: GCP backend + Interactions API. Google Scion as external orchestration. |
| MCP Apps ecosystem | Codex MCP Apps P1+P2 (meta to tool call results). 8,600+ servers. Pinterest 66K/month in production. SurePath AI governance. MCP Server Cards (.well-known) proposed. |
| Agents as supply chain participants | Dependabot-to-agent assignment for security remediation. Copilot Critic agent (uses Claude to review plans). OXC copilot-swe-agent contributing fixes. Agents managing security, not just generating code. |
| Cross-vendor agent orchestration | Google Scion: open-source, runs Claude+Codex+Gemini in parallel with container isolation. Copilot Studio: multi-model broker (5 models). OMX: community Codex orchestration. |
| Multi-model broker platforms | Copilot Studio GA with Claude Opus 4.6, Sonnet 4.5, Grok 4.1, GPT-5.3/5.4. Microsoft positioning as model-agnostic orchestrator. |
| Vendor cost-optimization as trust risk (NEW Apr 16) | Anthropic reduced default effort to medium to save tokens. Users noticed quality decline. Fortune, Axios, VentureBeat, The Register covering it (Apr 13-16). Boris Cherny acknowledged. First time a vendor’s cost-optimization decision became a public trust issue. Connects to subsidy question and Nate’s trust layer. |
| Vendor surface control | Anthropic claiming all interaction surfaces. OpenClaw ban as enforcement. But 3 days of silence since — unclear if strategy is expanding or pausing. |
| Meta JiT Testing | LLM generates tests per-PR by analyzing diff. No persistent test suite. 70% reduction in human review load. |
| Agent-to-Agent protocols + payments | A2A v1.0 (April 9): first stable spec. Multi-protocol, enterprise multi-tenancy, 5 production SDKs (Python, JS, Java, Go, .NET). 150+ orgs, 22K+ stars. AP2: 60+ orgs. Visa ICC: neutral payment layer for 4 protocols. McKinsey: $5T agent-driven sales by 2030. |
| Agent supply chain attacks | OpenClaw ClawHavoc: 824+ malicious skills (growing), 135K exposed instances. CVE-2026-35669 (CVSS 8.8, Apr 10) privilege escalation. First attack targeting agent execution patterns specifically. AMOS macOS stealer via agentic workflows. Claw Code (72K stars): clean-room Claude Code clone from source map leak. Axios npm compromise (Apr 10): North Korea-linked, affected OpenAI’s macOS CI/CD via floating dep tag. Antipattern: floating tags in GitHub Actions. |
| oh-my-codex (OMX) | 2.8K stars overnight. Community multi-agent orchestration for Codex CLI. |
| Mamba-Transformer hybrids for agents | Nemotron 3 Nano claims 5x throughput. Linear context scaling. If verified, changes local agent architecture. |
| KV cache compression for local inference | Google TurboQuant: 6x KV cache compression, zero accuracy loss, no retraining. ICLR 2026. llama.cpp integration exists (turboquant_plus, Metal support). Changes local inference economics: existing GPUs serve 6x more context. Combined with Copilot BYOK, strengthens case for local-first agent architecture. |
| Devin / Cognition Labs | $10.2B valuation, $150M ARR (with Windsurf). Real capability but unclear ROI. |
| sauna.ai (Wordware) | Largest YC seed ($30M), Instacart/Runway customers. Nate’s test: scored 1/4 on knowledge-work tasks. |
| Agent memory systems / context portability (UPDATED Apr 18) | Gemini Chapters, project-level memory, Nate’s “Open Brain” PostgreSQL+MCP pattern. Context persistence is the bottleneck. NEW: Nate names context portability as structural problem — “memory is the moat.” Proposes BYOC (Bring Your Own Context). Four loss points: tool switch, company mandate, job change, platform terms. Six months of use = qualitative output difference. |
| Agent infrastructure hardening (NEW Apr 18) | Epsilla survey (Apr 14): GAIA (local agent execution), Kontext CLI (JIT credential broker), SnapState (checkpoint/resume), Context Surgeon (agent self-manages context), OQP (testable assertions about agent behavior). Shift from “agents that chat” to “agents that persist, authenticate, and manage their own cognition.” |
| Background agent swarms | Multiple small agents running continuously on tiny local models. |
| Two-tier plugin distribution | Codex: curated (vetted, backend-hosted) + community (non-curated). Plugin marketplace economics forming. |
Watch — early signal, track for developments
| Pattern/Tool | Evidence |
|---|---|
| Agent-native devtools (NEW Apr 30) | Vite DevTools v0.1.16 ships devframe: “Framework-neutral devtools foundation + agent-native MCP.” First major dev tooling to ship MCP as first-class feature. Combined with antfu’s Claude Opus 4.7 co-authorship pattern. Agents aren’t just using tools — tools are being built for agents. |
| Purpose-built coding models (NEW Apr 30) | Poolside Laguna XS.2 (33B/3B MoE, Apache 2.0, 68.2% SWE-Bench) — first open-weight model architecturally designed for agentic coding. Paired with pool terminal agent. Mistral Medium 3.5 (128B dense, 77.6% SWE-Bench) — merged flagship for remote agents in Vibe. The model layer is no longer general-purpose models adapted for coding — dedicated coding models are arriving. |
| Dependency graph as agent interface (NEW Apr 30) | aube v1.5.0 aube query — vlt-inspired selector-based dependency graph queries with JSON output. Package managers exposing their graph for programmatic consumption, not just human-readable output. Combined with mise aube_args, the dev environment stack is adapting to agents as consumers. |
| Post-ban community migration | ZeroClaw (Rust), NullClaw, local models. OpenClaw community adopting Kimi K2.5. Credits expire April 17 (3 days). |
| OWASP MCP Top 10 | New compliance standard from MS Governance Toolkit. Maps agentic AI risks to MCP-specific controls. May become de facto standard. |
| AI workspace consolidation | Sauna, Notion AI, Glean. Crowded, no winner. |
| NIST AI Agent Identity standards | Comment period closed April 2. IAM frameworks for autonomous agents. |
| EU AI Act enforcement | August 2, 2026 — first major enforcement date. High-risk AI, GPAI, foundation model requirements. |
| Harness economics | OpenClaw ban proved the arbitrage model is unsustainable. Credits expired April 17 — no vendor positioned. Mutual silence held 28+ days. Codex pricing restructured: $20/$100/$200 tiers, token-based. Subsidy era ending industry-wide. |
| Full autonomous dev without checkpoints | The “Devin promise.” Evidence still mixed. |
| AI industry financial sustainability | Zitron: “subprime AI crisis” + “AI isn’t too big to fail.” |
| Agent-native IDEs | Is the IDE the agent, or does the agent use the IDE? Conway suggests the agent becomes the IDE. |
| Model-routing layers | Automatic model selection per task complexity. |
| Agent security monitoring | Codenotary AgentMon, Astrix Security, Black Duck Signal, Palo Alto Prisma AIRS 3.0. Security tooling wave forming. |
| AI agents as contributors | copilot-swe-agent contributed two OXC bug fixes (latest: node_modules config walker skip). Copilot Critic agent uses Claude to review plans. AI agents contributing to and reviewing tooling that other AI agents use. |
| Agent security vulnerabilities | CVE-2026-35022 (CVSS 9.8, Claude CLI/SDK command injection). Claude Code deny-rules bypass at 50+ subcommands. Security of agent tools becoming a distinct attack surface. |
Key risk signal: The subsidy question
The builder community describes a genuine productivity revolution. The financial analysis shows unstable foundations:
| Metric | Evidence |
|---|---|
| Anthropic compute vs revenue | $10B spent on compute, $5B revenue |
| OpenAI inference burn | $8.67B through Sept 2025 on $4.3B total revenue |
| Startup unit economics | $3-13 burned per $1 of subscription revenue |
| Data center gap | ~5GW under construction vs. 12GW+ promised |
| Harness arbitrage | 5x gap between subscription and API costs — now closed by ban |
The synthesis: The tools and workflows are real and productive. The pricing is subsidized and temporary. Anthropic’s OpenClaw ban was the first direct vendor margin defense. The effort-level reduction (Apr 16 backlash) is the second — quieter, applied to all users, noticed by the community. At $800B+ valuation against ~$5B revenue, the gap widens. Credits expire tomorrow. The most defensible investments are in patterns (spec-driven dev, orchestration architecture, MCP) rather than specific vendor subscriptions.
Sources: Ed Zitron, “The Subprime AI Crisis Is Here” (March 31, 2026) and “AI Isn’t Too Big To Fail” (April 3, 2026)
Dominant patterns in motion
Enterprise deployment becomes regulatory (ESCALATED — April 12)
The competition shifted from agent intelligence to organizational deployability (April 11). Now the Mythos government escalation adds regulatory pressure. Security hardening moves from competitive differentiator to compliance requirement for regulated industries. Treasury/Fed treating model capability as systemic risk means enterprise deployment features become mandatory, not optional. The “five durable layers” framework (trust, context, distribution, taste, liability) explains why: the “trust” layer is now the most critical — regulatory pressure drives it.
The portability sprint (April 8, continuing)
Everyone is decoupling agents from their native clouds. Copilot CLI: BYOK + Ollama + air-gapped. Scion: vendor-agnostic orchestration. Dependabot: multi-vendor agent assignment. Codex: WebRTC transport. The platforms are betting that lock-in loses. The most portable agent wins, not the most powerful.
The platform ships (updated April 11)
Codex v0.119.0 and v0.120.0 ended the alpha marathon. 33 alphas → two stables in 24 hours. The platform is real: MCP Apps, WebRTC realtime v2, 8+ extracted crates, remote exec-server, path-based multi-agent addressing, background agent streaming. Gemini’s GCP backend + Chapters + UCM is the same pattern. The CLI is no longer the product — it’s the thin interface to a platform.
Governance ships at platform speed
Microsoft’s governance toolkit gained OWASP MCP Top 10, SOC 2 mapping, and tool injection scanning in the same 48 hours that Codex shipped data residency and approval workflows. Governing as you ship, not after. The cross-vendor play: own the governance layer, influence every platform that needs compliance.
Spec -> Plan -> Tasks -> Code
The dominant new methodology. Write a specification -> agent decomposes into plan -> breaks into tasks -> generates implementation. Review at spec level, not code level.
Parallel agent worktrees
Infrastructure primitive. Every major tool and community wrappers ship this. Gemini adding worktree support in v0.37.0. Adopted.
The harness as key abstraction
The orchestration layer wrapping the LLM is where real engineering investment lies. Three-agent harness (planner/generator/evaluator) turns $9 broken output into $200 polished product. But: who controls the harness? Anthropic says they do. Community disagrees. Codex building two-tier plugin system may offer a middle path.
Prompting has fractured
Four distinct skills: Specification Engineering, Intent Framework Building, Evaluation Harness Design, Constraint Architecture. The “35-Minute Wall” is where 2025-era prompting collapses.
Path-based agent addressing (NEW)
Codex dropped agent IDs from spawn v2 in favor of path-based addressing. The agent tree is the address space. Fire-and-forget messaging reduces coupling. /feedback cascade enables hierarchical feedback propagation. This is a clean multi-agent communication model worth watching.
JiT testing over test suites
Meta (Feb 2026): LLM generates ephemeral tests per-change. Traditional testing cannot keep pace with agentic velocity.
Sources
| Source | URL | Focus |
|---|---|---|
| Nate’s Newsletter | natesnewsletter.substack.com | AI practitioner strategy, MCP, workflow optimization |
| Where’s Your Ed At | wheresyoured.at | AI financial sustainability critique |
| Anthropic 2026 Agentic Coding Trends | resources.anthropic.com | Harness patterns, industry data |
| Anthropic Harness Design Blog | anthropic.com/engineering | Three-agent harness architecture |
| Meta JiTTests | engineering.fb.com | Testing paradigm shift |
| GitHub Spec Kit | github.com/github/spec-kit | Spec-driven scaffolding |
| GitHub Copilot SDK | github.com/github/copilot-sdk | Composable agent runtime |
| AWS Kiro | kiro.dev | Spec-driven agentic IDE |
| Google ADK | google.github.io/adk-docs | Model-agnostic agent framework |
| Microsoft Agent Governance Toolkit | github.com/microsoft/agent-governance-toolkit | Cross-vendor agent governance |
| Google Scion | github.com/GoogleCloudPlatform/scion | Cross-vendor agent orchestration |
| SurePath AI | surepath.ai | MCP-specific runtime governance |
| Pinterest MCP (InfoQ) | infoq.com/news/2026/04/pinterest-mcp-ecosystem | Enterprise MCP case study |
| SDD Framework Map | Medium (30+ frameworks) | Landscape map |
| NIST AI Agent Standards | csrc.nist.gov | Regulatory direction |