Open-sourcing circuit-tracing tools
read at source ↗ www.anthropic.com
Open-sourcing circuit-tracing tools
Source: Anthropic Research Date: 2025-05-29 URL: https://www.anthropic.com/research/open-source-circuit-tracing
Summary
Open-source library for generating attribution graphs — partial exposures of internal computation steps — on open-weight models (Gemma-2-2b, Llama-3.2-1b). Paired with interactive Neuronpedia frontend for visualization and hypothesis testing via feature value modification. Demonstrates multi-step reasoning and multilingual representations visible through circuit tracing.
Implications
This is the mechanistic interpretability → community infrastructure move. Open-sourcing the circuit tracing toolkit democratizes the approach and speeds external validation — which Anthropic needs because the interpretability research program’s claims require independent replication at scale. The Neuronpedia integration makes it accessible to researchers without deep ML infrastructure. The focus on open-weight models (Gemma, Llama) is strategic: Anthropic can’t open-source Claude’s weights but can show the tools work on public models, building credibility. Watch for external researchers finding circuits or behaviors that Anthropic hasn’t published on, and for this tooling to become standard in the academic interpretability community.