2025-11-26 · Anthropic

Effective harnesses for long-running agents

protocolsagents

Effective harnesses for long-running agents

Source: Anthropic Engineering Date: 2025-11-26 URL: https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents

Summary

Anthropic describes a two-agent harness (Initializer + Coding Agent) for multi-session software development, where a 200+ item JSON feature list marked “failing” prevents premature completion and Puppeteer MCP provides end-to-end verification. The core design insight is treating each agent session like a new engineer with no memory: git commits with descriptive messages and a persistent progress file become the memory layer.

Implications

The agent harness design thread. This is the practical precursor to the “harness-design-long-running-apps” post — together they form Anthropic’s published harness playbook. The “strongly-worded instructions against deletion” pattern for the feature list is a small but notable safety primitive: encoding intent durability directly in the task file rather than the system prompt.

Verification-by-default. Browser automation (Puppeteer MCP) being described as essential — agents marked features complete without it — is a direct data point for harness designers. Automated end-to-end verification isn’t optional; it’s the mechanism that prevents the loop from declaring victory prematurely.

State management primitives. Git + a progress file as the external memory layer is the simplest viable approach. This directly informs the Ellis loop design and any similar agentic pipeline: the harness needs persistent, readable state that survives context resets.

← all signals