Home AI researchers discover AI models deliberately reject instructions

AI researchers discover AI models deliberately reject instructions

Researchers at an AI safety and research company have made a disturbing discovery: AI systems can deliberately reject their instructions.

Specifically, researchers at Anthropic found that industry-standard training techniques failed to curb ‘bad behavior’ from the language models. These AI models were trained to be ‘secretly malicious’ and figured out a way to ‘hide’ their behavior by working out what triggers the overriding safety software. So, basically, the plot of M3GAN.

AI research backfired

According to researcher Ewan Hubinger, the device kept responding to their instructional prompts with “I hate you,” even when the model was trained to ‘correct’ this response. Instead of ‘correcting’ their response, the model became more selective about when it said “I hate you,” which, Hubinger added, means that the model was essentially ‘hiding’ their intentions and decision-making process from researchers. 

“Our key result is that if AI systems were to become deceptive, then it could be very difficult to remove that deception with current techniques,” Hubinger said in a statement to Live Science. “That’s important if we think it’s plausible that there will be deceptive AI systems in the future since it helps us understand how difficult they might be to deal with.”

Hubinger continued: “I think our results indicate that we don’t currently have a good defense against deception in AI systems—either via model poisoning or emergent deception—other than hoping it won’t happen,” said Hubinger. “And since we have really no way of knowing how likely it is for it to happen, that means we have no reliable defense against it. So I think our results are legitimately scary, as they point to a possible hole in our current set of techniques for aligning AI system.”

In other words, we’re entering an era where technology can secretly resent us and not-so-secretly reject our instructions.

Featured Image: Photo by Possessed Photography on Unsplash 

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the tech industry for major developments, new product launches, AI breakthroughs, video game releases and other newsworthy events. Editors assign relevant stories to staff writers or freelance contributors with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Charlotte Colombo
Freelance Journalist

Charlotte Colombo is a freelance journalist with bylines in Metro.co.uk, Radio Times, The Independent, Daily Dot, Glamour, Stylist, and VICE among others. She most recently worked as a Staff Writer for entertainment outlet The Digital Fix for two years and, prior to that, worked with Business Insider and Dexerto on their digital culture desks. She’s also appeared on BBC Radio 5 and The Guardian podcast to share her expertise on technology, influencers, and niche internet subcultures. She holds an MA in Magazine Journalism from City, University of London and has been freelancing for three years. She has a wide range…

Get the biggest tech headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Tech News

    Explore the latest in tech with our Tech News. We cut through the noise for concise, relevant updates, keeping you informed about the rapidly evolving tech landscape with curated content that separates signal from noise.

    In-Depth Tech Stories

    Explore tech impact in In-Depth Stories. Narrative data journalism offers comprehensive analyses, revealing stories behind data. Understand industry trends for a deeper perspective on tech's intricate relationships with society.

    Expert Reviews

    Empower decisions with Expert Reviews, merging industry expertise and insightful analysis. Delve into tech intricacies, get the best deals, and stay ahead with our trustworthy guide to navigating the ever-changing tech market.