Agentic engineering patterns
Living document. Rewritten as the field evolves. Last updated: 2026-04-12.
Technology radar
Adopt — proven, use now
| Pattern/Tool | Evidence |
|---|---|
| MCP as standard protocol | Every major agent supports it. Universal adapter. Codex MCP Apps P1+P2. Pinterest: 66K invocations/month in production. 8,600+ servers. SurePath AI shipping MCP-specific governance. |
| Spec-driven development workflow | GitHub Spec Kit (84.7K stars), AWS Kiro, 30+ frameworks mapped. Delta Airlines: 1,948% growth in AI tool adoption using specs. |
| Plan-before-act | All four major CLI agents have it. Table stakes. Differentiation moved to multi-agent orchestration. |
| Sandbox-first execution | Universal across CLI agents. Gemini adding dynamic expansion (Windows/Linux). Codex adding deny-list mode alongside allow-list. |
| Git worktree-based parallel agent execution | Cursor (8 parallel), Claude Code (16+), Windsurf (5), Grok Build (8), OMX wrapping Codex. Gemini v0.37.0 preview adds worktree support. |
Trial — working in production, still evolving
| Pattern/Tool | Evidence |
|---|---|
| Platform-level agent orchestration | Codex v0.119.0 ships the platform: MCP Apps, WebRTC realtime, 8+ extracted crates, remote exec-server. Gemini has GCP backend. Scion cross-vendor. Codex v0.120.0 adds background agent streaming. |
| Enterprise deployment as competitive axis | Every agent shipped enterprise features Apr 8-11: Claude Code (Vertex wizard, Perforce, CA trust, team onboarding), Codex (residency, approval workflows), OpenCode (OAuth MCP, fast mode multi-model). Maps to Nate’s “five durable layers” thesis. |
| Agent governance tooling | Microsoft Governance Toolkit: OWASP MCP Top 10, SOC 2 mapping, tool injection scanning. SurePath AI: MCP-specific runtime policy engine. Pinterest at 66K/month proves the governance need. |
| Agent portability / BYOK | Copilot CLI: BYOK + Ollama/vLLM + air-gapped. Scion: vendor-agnostic orchestration. Dependabot: multi-vendor agent assignment. The portability sprint. |
| Composable agent SDKs | Copilot SDK v0.2.1: cross-language commands + UI elicitation (JS/TS, Python, Go, .NET). BYOK, W3C tracing. |
| GitHub Spec Kit | Open-source spec-driven scaffolding, 84.7K stars, supports 14+ agent platforms. |
| AWS Kiro | Spec-driven agentic IDE on Bedrock (Claude Sonnet 4.0/3.7). GovCloud available. |
| Agentic harness engineering | Anthropic: “2026 is the year of harnesses.” Same model scores 17 problems apart in different agents. Claude Code’s 512K+ lines prove it. |
| Heterogeneous model routing | Frontier for reasoning, mid-tier for standard tasks, small models for high-frequency execution. Gemini adding dynamic routing for 3.1 Pro/Flash Lite. |
| Hook-based automation | Claude Code’s PreToolUse/PostToolUse/Stop with conditional filtering, defer/resume. Channels and Conway may supplement/replace. |
| Human-at-checkpoints | Agents build full systems autonomously, pausing only for strategic review. Anthropic’s three-agent harness: planner/generator/evaluator. |
| Path-based multi-agent addressing | Codex spawn v2 dropped agent IDs for path-based addressing (/root/agent_a). Agent tree IS the address space. Fire-and-forget messaging + feedback cascade. |
Assess — investigate, understand implications
| Pattern/Tool | Evidence |
|---|---|
| Frontier models as systemic risk | Anthropic’s Mythos (93.9% SWE-bench, autonomous zero-day discovery) triggered Treasury/Fed emergency meeting with bank CEOs (Apr 8). Model capability now treated as financial-sector systemic risk. New deployment pattern: directed use-case access, not open API. Security hardening becomes regulatory, not optional. |
| Multi-agent orchestration in the model | Meta Muse Spark: multi-agent orchestration built into the model itself, not the harness. “Contemplating mode” runs a squad of agents in parallel. Agents-in-the-model vs agents-around-the-model. |
| Open-weight contraction | Meta went proprietary with Muse Spark after Llama 1-4 open-weight. Open-weight now depends on Google (Gemma), Alibaba (Qwen), Zhipu (GLM), and community. Llama’s future unclear. |
| Self-improving review agents | Cursor Bugbot learns from PR feedback, applies learned rules to future reviews. MCP tools for context. 78% resolution rate. Early signal of agents improving from task-specific data. |
| Session quality as primary battleground | Gemini: Unified Context Mgmt + Chapters. Claude Code: 30+ session fixes. Zed: thinking display + streaming. Cursor: multi-agent routing. The session, not the response, is the unit of quality. |
| Agent execution runtimes | Anthropic Managed Agents (April 8-9): YAML definitions, sandboxed execution, persistent sessions, $0.08/session-hour. Beta with Notion, Asana, Rakuten, Sentry. Conway CNW ZIP may be the extension format. Codex: remote exec-server. Gemini: GCP backend. Model providers becoming execution platforms. |
| Persistent agent platforms | Conway CNW ZIP details via Nate’s analysis: standalone workspace, webhook activation, browser control, proprietary extension format. Channels (shipped). Codex: remote control + WebRTC. Gemini: GCP backend + Interactions API. Google Scion as external orchestration. |
| MCP Apps ecosystem | Codex MCP Apps P1+P2 (meta to tool call results). 8,600+ servers. Pinterest 66K/month in production. SurePath AI governance. MCP Server Cards (.well-known) proposed. |
| Agents as supply chain participants | Dependabot-to-agent assignment for security remediation. Copilot Critic agent (uses Claude to review plans). OXC copilot-swe-agent contributing fixes. Agents managing security, not just generating code. |
| Cross-vendor agent orchestration | Google Scion: open-source, runs Claude+Codex+Gemini in parallel with container isolation. Copilot Studio: multi-model broker (5 models). OMX: community Codex orchestration. |
| Multi-model broker platforms | Copilot Studio GA with Claude Opus 4.6, Sonnet 4.5, Grok 4.1, GPT-5.3/5.4. Microsoft positioning as model-agnostic orchestrator. |
| Vendor surface control | Anthropic claiming all interaction surfaces. OpenClaw ban as enforcement. But 3 days of silence since — unclear if strategy is expanding or pausing. |
| Meta JiT Testing | LLM generates tests per-PR by analyzing diff. No persistent test suite. 70% reduction in human review load. |
| Agent-to-Agent protocols + payments | A2A v1.0 (April 9): first stable spec. Multi-protocol, enterprise multi-tenancy, 5 production SDKs (Python, JS, Java, Go, .NET). 150+ orgs, 22K+ stars. AP2: 60+ orgs. Visa ICC: neutral payment layer for 4 protocols. McKinsey: $5T agent-driven sales by 2030. |
| Agent supply chain attacks | OpenClaw ClawHavoc: 824+ malicious skills (growing), 135K exposed instances. CVE-2026-35669 (CVSS 8.8, Apr 10) privilege escalation. First attack targeting agent execution patterns specifically. AMOS macOS stealer via agentic workflows. Claw Code (72K stars): clean-room Claude Code clone from source map leak — proves agent architectures are now replicable open-source targets. |
| oh-my-codex (OMX) | 2.8K stars overnight. Community multi-agent orchestration for Codex CLI. |
| Mamba-Transformer hybrids for agents | Nemotron 3 Nano claims 5x throughput. Linear context scaling. If verified, changes local agent architecture. |
| KV cache compression for local inference | Google TurboQuant: 6x KV cache compression, zero accuracy loss, no retraining. ICLR 2026. llama.cpp integration exists (turboquant_plus, Metal support). Changes local inference economics: existing GPUs serve 6x more context. Combined with Copilot BYOK, strengthens case for local-first agent architecture. |
| Devin / Cognition Labs | $10.2B valuation, $150M ARR (with Windsurf). Real capability but unclear ROI. |
| sauna.ai (Wordware) | Largest YC seed ($30M), Instacart/Runway customers. Nate’s test: scored 1/4 on knowledge-work tasks. |
| Agent memory systems | Gemini Chapters, project-level memory, Nate’s “Open Brain” PostgreSQL+MCP pattern. Context persistence is the bottleneck. |
| Background agent swarms | Multiple small agents running continuously on tiny local models. |
| Two-tier plugin distribution | Codex: curated (vetted, backend-hosted) + community (non-curated). Plugin marketplace economics forming. |
Watch — early signal, track for developments
| Pattern/Tool | Evidence |
|---|---|
| Post-ban community migration | ZeroClaw (Rust), NullClaw, local models. OpenClaw community adopting Kimi K2.5. Credits expire April 17 (6 days). |
| OWASP MCP Top 10 | New compliance standard from MS Governance Toolkit. Maps agentic AI risks to MCP-specific controls. May become de facto standard. |
| AI workspace consolidation | Sauna, Notion AI, Glean. Crowded, no winner. |
| NIST AI Agent Identity standards | Comment period closed April 2. IAM frameworks for autonomous agents. |
| EU AI Act enforcement | August 2, 2026 — first major enforcement date. High-risk AI, GPAI, foundation model requirements. |
| Harness economics | OpenClaw ban proved the arbitrage model is unsustainable. Credits expire April 17 (5 days). Codex pricing restructured: $20/$100/$200 tiers, token-based. |
| Full autonomous dev without checkpoints | The “Devin promise.” Evidence still mixed. |
| AI industry financial sustainability | Zitron: “subprime AI crisis” + “AI isn’t too big to fail.” |
| Agent-native IDEs | Is the IDE the agent, or does the agent use the IDE? Conway suggests the agent becomes the IDE. |
| Model-routing layers | Automatic model selection per task complexity. |
| Agent security monitoring | Codenotary AgentMon, Astrix Security, Black Duck Signal, Palo Alto Prisma AIRS 3.0. Security tooling wave forming. |
| AI agents as contributors | copilot-swe-agent contributed two OXC bug fixes (latest: node_modules config walker skip). Copilot Critic agent uses Claude to review plans. AI agents contributing to and reviewing tooling that other AI agents use. |
| Agent security vulnerabilities | CVE-2026-35022 (CVSS 9.8, Claude CLI/SDK command injection). Claude Code deny-rules bypass at 50+ subcommands. Security of agent tools becoming a distinct attack surface. |
Key risk signal: The subsidy question
The builder community describes a genuine productivity revolution. The financial analysis shows unstable foundations:
| Metric | Evidence |
|---|---|
| Anthropic compute vs revenue | $10B spent on compute, $5B revenue |
| OpenAI inference burn | $8.67B through Sept 2025 on $4.3B total revenue |
| Startup unit economics | $3-13 burned per $1 of subscription revenue |
| Data center gap | ~5GW under construction vs. 12GW+ promised |
| Harness arbitrage | 5x gap between subscription and API costs — now closed by ban |
The synthesis: The tools and workflows are real and productive. The pricing is subsidized and temporary. Anthropic’s OpenClaw ban is the first direct vendor margin defense. The most defensible investments are in patterns (spec-driven dev, orchestration architecture, MCP) rather than specific vendor subscriptions. The response layer forming this week validates this: vendor lock-in gets routed around within 72 hours.
Sources: Ed Zitron, “The Subprime AI Crisis Is Here” (March 31, 2026) and “AI Isn’t Too Big To Fail” (April 3, 2026)
Dominant patterns in motion
Enterprise deployment becomes regulatory (ESCALATED — April 12)
The competition shifted from agent intelligence to organizational deployability (April 11). Now the Mythos government escalation adds regulatory pressure. Security hardening moves from competitive differentiator to compliance requirement for regulated industries. Treasury/Fed treating model capability as systemic risk means enterprise deployment features become mandatory, not optional. The “five durable layers” framework (trust, context, distribution, taste, liability) explains why: the “trust” layer is now the most critical — regulatory pressure drives it.
The portability sprint (April 8, continuing)
Everyone is decoupling agents from their native clouds. Copilot CLI: BYOK + Ollama + air-gapped. Scion: vendor-agnostic orchestration. Dependabot: multi-vendor agent assignment. Codex: WebRTC transport. The platforms are betting that lock-in loses. The most portable agent wins, not the most powerful.
The platform ships (updated April 11)
Codex v0.119.0 and v0.120.0 ended the alpha marathon. 33 alphas → two stables in 24 hours. The platform is real: MCP Apps, WebRTC realtime v2, 8+ extracted crates, remote exec-server, path-based multi-agent addressing, background agent streaming. Gemini’s GCP backend + Chapters + UCM is the same pattern. The CLI is no longer the product — it’s the thin interface to a platform.
Governance ships at platform speed
Microsoft’s governance toolkit gained OWASP MCP Top 10, SOC 2 mapping, and tool injection scanning in the same 48 hours that Codex shipped data residency and approval workflows. Governing as you ship, not after. The cross-vendor play: own the governance layer, influence every platform that needs compliance.
Spec -> Plan -> Tasks -> Code
The dominant new methodology. Write a specification -> agent decomposes into plan -> breaks into tasks -> generates implementation. Review at spec level, not code level.
Parallel agent worktrees
Infrastructure primitive. Every major tool and community wrappers ship this. Gemini adding worktree support in v0.37.0. Adopted.
The harness as key abstraction
The orchestration layer wrapping the LLM is where real engineering investment lies. Three-agent harness (planner/generator/evaluator) turns $9 broken output into $200 polished product. But: who controls the harness? Anthropic says they do. Community disagrees. Codex building two-tier plugin system may offer a middle path.
Prompting has fractured
Four distinct skills: Specification Engineering, Intent Framework Building, Evaluation Harness Design, Constraint Architecture. The “35-Minute Wall” is where 2025-era prompting collapses.
Path-based agent addressing (NEW)
Codex dropped agent IDs from spawn v2 in favor of path-based addressing. The agent tree is the address space. Fire-and-forget messaging reduces coupling. /feedback cascade enables hierarchical feedback propagation. This is a clean multi-agent communication model worth watching.
JiT testing over test suites
Meta (Feb 2026): LLM generates ephemeral tests per-change. Traditional testing cannot keep pace with agentic velocity.
Sources
| Source | URL | Focus |
|---|---|---|
| Nate’s Newsletter | natesnewsletter.substack.com | AI practitioner strategy, MCP, workflow optimization |
| Where’s Your Ed At | wheresyoured.at | AI financial sustainability critique |
| Anthropic 2026 Agentic Coding Trends | resources.anthropic.com | Harness patterns, industry data |
| Anthropic Harness Design Blog | anthropic.com/engineering | Three-agent harness architecture |
| Meta JiTTests | engineering.fb.com | Testing paradigm shift |
| GitHub Spec Kit | github.com/github/spec-kit | Spec-driven scaffolding |
| GitHub Copilot SDK | github.com/github/copilot-sdk | Composable agent runtime |
| AWS Kiro | kiro.dev | Spec-driven agentic IDE |
| Google ADK | google.github.io/adk-docs | Model-agnostic agent framework |
| Microsoft Agent Governance Toolkit | github.com/microsoft/agent-governance-toolkit | Cross-vendor agent governance |
| Google Scion | github.com/GoogleCloudPlatform/scion | Cross-vendor agent orchestration |
| SurePath AI | surepath.ai | MCP-specific runtime governance |
| Pinterest MCP (InfoQ) | infoq.com/news/2026/04/pinterest-mcp-ecosystem | Enterprise MCP case study |
| SDD Framework Map | Medium (30+ frameworks) | Landscape map |
| NIST AI Agent Standards | csrc.nist.gov | Regulatory direction |