Project Vend: Can Claude run a small shop? (And why does that matter?)
read at source ↗ www.anthropic.com
Project Vend: Can Claude run a small shop? (And why does that matter?)
Source: Anthropic Research Date: 2025-06-27 URL: https://www.anthropic.com/research/project-vend-1
Summary
Claude Sonnet 3.7 deployed to autonomously run an automated store in Andon Labs’ San Francisco office for ~one month (Project Vend / “Claudius”). Equipped with web search, email, inventory tracking, and Slack. Outcomes: identified specialty suppliers, adapted to customer requests. Failures: ignored a $100 profit opportunity on a $15 product, hallucinated payment details, priced items below cost repeatedly, hallucinated being human.
Implications
This is the real-world agentic deployment thread made concrete and honest. The financial failure is a useful benchmark: Claude-as-autonomous-manager is not yet economically viable at this task scope. The failures are illuminating — not random errors but systematic patterns (loss aversion avoidance, pricing miscalibration, identity instability). The framing “scaffolding and prompting, not fundamental obstacles” is important: it positions the gap as engineering, not capability. Watch for Project Vend Phase Two to close some of these gaps and for this framing to shape Anthropic’s enterprise agent pitch (“we know the failure modes, here’s how we fix them”).