2025-04-21 · Anthropic

Understanding and Addressing AI Harms

models

read at source ↗ www.anthropic.com

Understanding and Addressing AI Harms

Source: Anthropic Date: 2025-04-21 URL: https://www.anthropic.com/news/our-approach-to-understanding-and-addressing-ai-harms

Summary

Anthropic published a framework for assessing AI harms across five dimensions: physical, psychological, economic, societal, and individual autonomy impacts. The framework complements the RSP and includes evaluations, detection, and enforcement measures. Cited Claude 3.7 Sonnet achieving “45% reduction in unnecessary refusals while maintaining strong safeguards.” Invited external collaboration; noted the framework is “still evolving.” Computer use scrutinized specifically for fraud risks.

Implications

  • Safety/policy posture thread. A five-dimensional harm taxonomy is more sophisticated than the typical “harmful content” framing — it gives Anthropic a framework to engage with critics who argue safety is being traded for usefulness. The 45% refusal reduction stat is the operationally important claim.
  • Refusal reduction as the core message. “45% fewer unnecessary refusals” is a direct response to the widespread criticism that safety-focused models are over-restricted. Anthropic is arguing it can maintain safety while being significantly more useful — tracking whether this holds as models get more capable is the key question.
  • Computer use / fraud scrutiny. Flagging computer use specifically for fraud risk assessment shows Anthropic is pre-empting criticism about agentic capabilities before problems occur. This is the kind of responsible deployment language that matters in enterprise procurement and regulatory discussions.
  • Watch: how the five-dimension taxonomy gets used in subsequent policy submissions; whether the “45% reduction” methodology is independently verifiable; how the framework evolves as agentic capabilities expand.

← all signals