2025-05-29 · Anthropic

Open-sourcing circuit-tracing tools

models

read at source ↗ www.anthropic.com

Open-sourcing circuit-tracing tools

Source: Anthropic Research Date: 2025-05-29 URL: https://www.anthropic.com/research/open-source-circuit-tracing

Summary

Open-source library for generating attribution graphs — partial exposures of internal computation steps — on open-weight models (Gemma-2-2b, Llama-3.2-1b). Paired with interactive Neuronpedia frontend for visualization and hypothesis testing via feature value modification. Demonstrates multi-step reasoning and multilingual representations visible through circuit tracing.

Implications

This is the mechanistic interpretability → community infrastructure move. Open-sourcing the circuit tracing toolkit democratizes the approach and speeds external validation — which Anthropic needs because the interpretability research program’s claims require independent replication at scale. The Neuronpedia integration makes it accessible to researchers without deep ML infrastructure. The focus on open-weight models (Gemma, Llama) is strategic: Anthropic can’t open-source Claude’s weights but can show the tools work on public models, building credibility. Watch for external researchers finding circuits or behaviors that Anthropic hasn’t published on, and for this tooling to become standard in the academic interpretability community.

← all signals