Home Larger AI chatbots often give incorrect answers over admitting uncertainty, study shows

Larger AI chatbots often give incorrect answers over admitting uncertainty, study shows

A study of recent, larger versions of three major AI chatbots reveals that they are more likely to provide incorrect answers than to admit their mistakes. The results, published in Nature on Wednesday (Sept. 25), also discovered that people often struggle to identify these errors.

ReadWrite has reported about how chatbots can “hallucinate” answers to queries in the past. Hence José Hernández-Orallo from the Valencian Research Institute for Artificial Intelligence in Spain, along with his colleagues, examined these misfires to understand how they evolve as AI models grow larger and use more training data. It also incorporates more parameters or decision-making nodes, consuming greater computing power.

They also investigated whether the amount of errors aligns with human perceptions of question difficulty and how effectively people can recognize incorrect answers.

Are AI LLMs trustworthy?

The team discovered that larger, more refined versions of large language models (LLMs) are more accurate, largely thanks to fine-tuning methods like reinforcement learning from human feedback. However, they are also less reliable. The researchers found that among all incorrect responses, the proportion of wrong answers has risen because these AI models are now less likely to avoid answering a question—such as admitting they don’t know or diverting the topic.

One of the researchers, Lexin Zhou, wrote on X: “LLMs are indeed less correct on tasks that humans consider difficult, but they still do succeed at difficult tasks before being flawless on easy tasks, leading to no safe operation conditions humans can identify where LLMs can be trusted.”

He added that it was “concerning” that the latest LLMs improve mainly on the” high-difficulty instances,” exacerbating the discordance between human difficulty expectation and LLM success.

The team evaluated OpenAI’s GPT, Meta’s LLaMA, and BLOOM. They tested early and refined models on prompts covering arithmetic, geography, and information transformation. They found that accuracy improved with model size but dropped with more challenging questions.

Models, including GPT-4, often answered difficult questions, with wrong answers exceeding 60 percent for some refined models. Surprisingly, even easy questions were sometimes answered incorrectly. Volunteers misclassified inaccurate answers as correct 10 percent to 40 percent of the time, showcasing issues with supervising the models.

Hernández-Orallo suggests that developers should “boost AI performance on easy questions” and encourage chatbots to avoid answering difficult ones, allowing users to more accurately assess when AIs are reliable. He states, “We need humans to understand: ‘I can use it in this area, and I shouldn’t use it in that area’.”

Featured image: Ideogram

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the tech industry for major developments, new product launches, AI breakthroughs, video game releases and other newsworthy events. Editors assign relevant stories to staff writers or freelance contributors with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Suswati Basu
Tech journalist

Suswati Basu is a multilingual, award-winning editor and the founder of the intersectional literature channel, How To Be Books. She was shortlisted for the Guardian Mary Stott Prize and longlisted for the Guardian International Development Journalism Award. With 18 years of experience in the media industry, Suswati has held significant roles such as head of audience and deputy editor for NationalWorld news, digital editor for Channel 4 News and ITV News. She has also contributed to the Guardian and received training at the BBC As an audience, trends, and SEO specialist, she has participated in panel events alongside Google. Her…

Get the biggest tech headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Tech News

    Explore the latest in tech with our Tech News. We cut through the noise for concise, relevant updates, keeping you informed about the rapidly evolving tech landscape with curated content that separates signal from noise.

    In-Depth Tech Stories

    Explore tech impact in In-Depth Stories. Narrative data journalism offers comprehensive analyses, revealing stories behind data. Understand industry trends for a deeper perspective on tech's intricate relationships with society.

    Expert Reviews

    Empower decisions with Expert Reviews, merging industry expertise and insightful analysis. Delve into tech intricacies, get the best deals, and stay ahead with our trustworthy guide to navigating the ever-changing tech market.