OpenAI introduces new governance model for AI safety oversight

OpenAI has introduced a new governance structure that grants its board the authority to withhold the release of AI models, even if company leadership has deemed them safe, according to a recent Bloomberg report. The decision, detailed in recently published guidelines, comes after a tumultuous period at OpenAI, including the temporary ousting of CEO Sam Altman. This event highlighted the delicate balance of power between the company’s directors and its executive team.

OpenAI’s newly formed “preparedness” team, led by Aleksander Madry of MIT, is tasked with continuously assessing the company’s AI systems. The team will focus on identifying and mitigating potential cybersecurity threats and risks related to chemical, nuclear, and biological dangers. OpenAI defines “catastrophic” risks as those capable of causing extensive economic damage or significant harm to individuals.

Madry’s team will provide monthly reports to an internal safety advisory group, which will then offer recommendations to Altman and the board. While the leadership team can decide on the release of new AI systems based on these reports, the board retains the final say, potentially overruling any decision made by the company’s executives.

OpenAI’s three-tiered approach to AI safety

OpenAI’s approach to AI safety is structured around three distinct teams:

Safety Systems: This team focuses on current products like GPT-4, ensuring they meet safety standards.
Preparedness: The new team led by Madry evaluates unreleased, advanced AI models for potential risks.
Superalignment: Led by Ilya Sutskever, the Superalignment team will concentrate on future, hypothetical AI systems that could possess immense power.

Each team plays a crucial role in assessing different aspects of AI safety, from existing products to future developments.

The preparedness team will rate AI models as “low,” “medium,” “high,” or “critical” based on perceived risks. OpenAI plans to release only those models rated as “medium” or “low.” The team will also implement changes to reduce identified dangers and evaluate the effectiveness of these modifications.

Madry expressed his hope to Bloomberg that other companies will adopt OpenAI’s guidelines for their AI models. These guidelines formalize processes that OpenAI has previously used in evaluating and releasing AI technology.

Madry emphasized the proactive role in shaping AI’s impact: “AI is not something that just happens to us that might be good or bad. It’s something we’re shaping.”