2025-03-20 · Anthropic

The "think" tool: Enabling Claude to stop and think

models

The “think” tool: Enabling Claude to stop and think

Source: Anthropic Engineering Date: 2025-03-20 URL: https://www.anthropic.com/engineering/claude-think-tool

Summary

Anthropic describes a “think” tool that gives Claude a mid-response reasoning space — distinct from extended thinking (pre-response) — by accepting a string parameter with no external side effects. On τ-Bench, the tool improved airline domain pass¹ scores from 0.370 to 0.570 (a 54% relative gain) when paired with domain-specific reasoning examples; retail domain improved more modestly (0.783 to 0.812); SWE-Bench showed 1.6% average improvement.

Implications

The prompting and tool-use guidance thread. This is Anthropic publishing a concrete, low-cost prompting primitive — the think tool is a zero-dependency addition to any tool specification. The recommendation to use it specifically for “tool output analysis” and “policy-heavy environments” is a scoped deployment guide, not a general recommendation, which makes it more trustworthy as guidance.

Domain-specificity multiplies gains. The dramatic airline vs. retail delta (54% vs. 4% relative) shows that the tool’s value depends heavily on how well the system prompt encodes domain reasoning patterns. Generic deployment will underperform; intentional prompting design is required.

Harness design implication. For Claude Code harnesses running multi-step tool sequences (file reads → analysis → write decisions), the think tool is the recommended insert point for mid-loop reflection. This is additive to the ACI-over-prompting principle: the interface matters, and a think tool call is part of the interface.

← all signals