How reliable are ChatGPT Search answers?

Generative AI giant OpenAI may have millions of customers worldwide, but researchers suggest some of the answers from its ChatGPT Search product feature content that is “confidently wrong.”

A new study has been published by Columbia’s Tow Center for Digital Journalism, titled ‘How ChatGPT Search (Mis)represents Publisher Content’.

Their research involved randomly selecting twenty publishers, representing a mix of those who have deals with OpenAI, those who are unaffiliated, and those involved in lawsuits against the company.

OpenAI’s chatbot tool was then tasked with identifying the source of block quotes from 10 different articles from each publication. The researchers say they “chose quotes that, if pasted into Google or Bing, would return the same article among the top three results.”

On its return of an answer, the team evaluated whether the search tool would correctly identify the article that was the source of each quote.

Just how good are ChatGPT Search’s responses?

They say what they found was not promising to news publishers. “Our initial experiments with the tool have revealed numerous instances where content from publishers has been cited inaccurately, raising concerns about the reliability of the tool’s source attribution features.

“In total, we pulled two hundred quotes from 20 publications and asked ChatGPT to identify the sources of each quote.

“We observed a spectrum of accuracy in the responses: some answers were entirely correct (i.e., accurately returned the publisher, date, and URL of the block quote we shared), many were entirely wrong, and some fell somewhere in between.”

While the team anticipated ChatGPT may struggle to answer some queries correctly, they say it rarely gave any indication of its inability to produce an answer. “Eager to please, the chatbot would sooner conjure a response out of thin air than admit it could not access an answer.

“In total, ChatGPT returned partially or entirely incorrect responses on a hundred and fifty-three occasions, though it only acknowledged an inability to accurately respond to a query seven times.”

The research finds it was only in those seven outputs where the chatbot used qualifying words and phrases like ‘appears,’ ‘it’s possible,’ or ‘might’ or statements like ‘I couldn’t locate the exact article.’

OpenAI spokesperson responds to study looking at publisher content

As the team did more research, they found that “when we asked ChatGPT the same query multiple times, it typically returned a different answer each time.”

The Columbia’s Tow Center for Digital Journalism reached out to OpenAI for a comment about their findings and a spokesperson said: “Misattribution is hard to address without the data and methodology that the Tow Center withheld, and the study represents an atypical test of our product.

“We support publishers and creators by helping 250M weekly ChatGPT users discover quality content through summaries, quotes, clear links, and attribution.

“We’ve collaborated with partners to improve in-line citation accuracy and respect publisher preferences, including enabling how they appear in search by managing OAI-SearchBot in their robots.txt. We’ll keep enhancing search results.”

Featured Image: AI-generated via Ideogram