Home Anthropic introduces major update to AI safety policy for frontier risks

Anthropic introduces major update to AI safety policy for frontier risks

TLDR

  • Anthropic updated its AI safety policy to address risks from powerful AI models with new safeguards.
  • New Capability Thresholds set benchmarks for AI models, requiring more safety measures as they advance.
  • Anthropic hopes its AI Safety Levels will inspire industry-wide adoption of similar risk management practices.

Anthropic, the tech firm behind chatbot Claude, has revealed it has rolled out a significant update to its AI safety policy, the risk governance framework it uses to mitigate potential catastrophic risks from frontier AI systems.

The updated Responsible Scaling Policy (RSP) is said to introduce a “more flexible and nuanced approach” to assessing and managing AI risks while maintaining its commitment not to train or deploy models unless it has implemented adequate safeguards.

The policy, first introduced in 2023, was revised with new protocols to ensure that as AI models become more powerful, they are developed and deployed in a safe manner. The revision establishes specific Capability Thresholds—benchmarks that point out when an AI model’s abilities require additional safeguards.

What is Anthropic’s new AI safety policy?

The thresholds address high-risk areas like bioweapons creation and autonomous AI research. The update also clarifies the expanded responsibilities of the Responsible Scaling Officer, a position Anthropic will maintain to oversee compliance and ensure that appropriate safeguards are effectively implemented.

Anthropic says that although the policy focuses on catastrophic risks like the aforementioned categories, they are not the only risks that it needs to monitor and prepare for. In a post, the company wrote: “Our Usage Policy sets forth our standards for the use of our products, including rules that prohibit using our models to spread misinformation, incite violence or hateful behavior, or engage in fraudulent or abusive practices.”

The strategy is intended to serve as a blueprint for the broader AI industry. The company aims for its policy to be “exportable,” hoping it will inspire other AI developers to adopt similar safety frameworks. By introducing AI Safety Levels (ASLs), modeled after the U.S. government’s biosafety standards, Anthropic is hoping to set a precedent for how AI companies can systematically manage risk.

The tiered ASL system, ranging from ASL-2 (current safety standards) to ASL-3 (enhanced protections for higher-risk models), establishes a structured framework for scaling AI development safely. For example, if a model demonstrates potentially harmful autonomous capabilities, it would automatically be escalated to ASL-3, demanding more intensive red-teaming (simulated adversarial testing) and third-party audits prior to deployment.

If adopted across the industry, this system could foster what Anthropic describes as a “race to the top” for AI safety, encouraging companies to compete on model performance and the robustness of their safety measures.

The policy change comes as ReadWrite reported that CEO Dario Amodei published an essay outlining a roadmap for the future potential of AI and describing a vision of how the technology could transform society.

Featured image: Anthropic / Canva

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the tech industry for major developments, new product launches, AI breakthroughs, video game releases and other newsworthy events. Editors assign relevant stories to staff writers or freelance contributors with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Suswati Basu
Tech journalist

Suswati Basu is a multilingual, award-winning editor and the founder of the intersectional literature channel, How To Be Books. She was shortlisted for the Guardian Mary Stott Prize and longlisted for the Guardian International Development Journalism Award. With 18 years of experience in the media industry, Suswati has held significant roles such as head of audience and deputy editor for NationalWorld news, digital editor for Channel 4 News and ITV News. She has also contributed to the Guardian and received training at the BBC As an audience, trends, and SEO specialist, she has participated in panel events alongside Google. Her…

Get the biggest tech headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Tech News

    Explore the latest in tech with our Tech News. We cut through the noise for concise, relevant updates, keeping you informed about the rapidly evolving tech landscape with curated content that separates signal from noise.

    In-Depth Tech Stories

    Explore tech impact in In-Depth Stories. Narrative data journalism offers comprehensive analyses, revealing stories behind data. Understand industry trends for a deeper perspective on tech's intricate relationships with society.

    Expert Reviews

    Empower decisions with Expert Reviews, merging industry expertise and insightful analysis. Delve into tech intricacies, get the best deals, and stay ahead with our trustworthy guide to navigating the ever-changing tech market.