OpenAI's Superalignment team innovates control methods for super-intelligent AI

OpenAI says it is making progress in its capabilities to manage super-intelligent AI systems, according to a recent WIRED report. The Superalignment team, led by OpenAI’s chief scientist Ilya Sutskever, has developed a method to guide the behavior of AI models as they grow increasingly smarter.

The Superalignment team, established in July, focuses on the challenge of ensuring that AI remains safe and beneficial as it approaches and surpasses human intelligence. “AGI is very fast approaching,” Leopold Aschenbrenner, a researcher at OpenAI, told WIRED. “We’re gonna see superhuman models, they’re gonna have vast capabilities and they could be very, very dangerous, and we don’t yet have the methods to control them.”

OpenAI’s new research paper presents a technique called supervision, where a less advanced AI model guides the behavior of a more sophisticated one. This method aims to maintain the superior model’s capabilities while ensuring it adheres to safe and ethical guidelines. The approach is seen as a crucial step toward managing potential superhuman AIs.

The experiments involved using OpenAI’s GPT-2 text generator to teach GPT-4, a more advanced system. The researchers tested two methods to prevent the degradation of GPT-4’s performance. The first method involved training progressively larger models, and the second added an algorithmic tweak to GPT-4. The latter proved more effective, though the researchers acknowledge that perfect behavior control is not yet guaranteed.

Industry response and future directions

Dan Hendryks, director of the Center for AI Safety, praised OpenAI’s proactive approach to controlling superhuman AIs. The Superalignment team’s work is seen as an important first step, but further research and development are necessary to ensure effective control systems.

OpenAI plans to dedicate a significant portion of its computing power to the Superalignment project and is calling for external collaboration. The company, in partnership with Eric Schmidt, is offering $10 million in grants to researchers working on AI control techniques. Additionally, there will be a conference on superalignment next year to further explore this critical area.

Ilya Sutskever, a co-founder of OpenAI and a key figure in the company’s technical advancements, co-leads the Superalignment team. His involvement in the project is crucial, especially following the recent governance crisis at OpenAI. Sutskever’s expertise and leadership are instrumental in driving the project forward.

The development of methods to control super-intelligent AI is a complex and urgent task. As AI technology rapidly advances, ensuring its alignment with human values and safety becomes increasingly critical. OpenAI’s initiative in this area marks a significant step, but the journey towards reliable and effective AI control systems is ongoing and requires collaborative efforts from the global AI research community.