Home Apple scientists: ‘No evidence of formal reasoning’ in LLMs

Apple scientists: ‘No evidence of formal reasoning’ in LLMs

TLDR

  • Apple researchers published a study highlighting key limitations in large language models (LLMs).
  • The study shows LLMs struggle with mathematical reasoning, dropping accuracy by up to 65%.
  • Apple's team argues LLMs rely on pattern matching, not true formal reasoning or understanding.

Apple researchers have published a study detailing key limitations in LLMs, or large language models, from major AI labs like OpenAI.

The study, worked on by scientists from the tech giant and published this month, reveals a new benchmark used to evaluate LLMs’ mathematical reasoning skills. That benchmark has highlighted limitations in some of the world’s top LLMs, including OpenAI’s 4o and o1 models.

Specifically, the paper found that changing the wording of questions or adding unrelated phrases could drastically change the results. In some cases, accuracy dropped by up to 65%. The more complex the questions, the wider the range of results achieved, lowering the accuracy.

As such, the team at Apple, made up of Iman Mirzadeh, Keivan Alizadeh, Hooman Shahrokhi, Oncel Tuzel, Samy Bengio, and Mehrdad Farajtabar, have concluded that their research shows “no evidence of formal reasoning” in the models tested. They argue that the behavior is more likely sophisticated pattern matching than mathematical processes.

What are LLMs?

LLMs are what underpins a lot of AI-powered tools nowadays. They are a type of AI that uses machine learning to understand and generate human language, making them useful for text analysis, understanding text prompts, and so on. LLMs are typically trained on large amounts of data, such as books and articles, to learn how language works.

However, this research appears to support the theory that LLMs cannot truly reason yet. That leaves some doubt about whether they can be trusted with more complex tasks, at a time when many companies are using AI for increasingly important roles.

For example, OpenAI’s CEO Sam Altman outlined plans for AI to feed into virtually every aspect of life, from healthcare and education to home assistants and workplace aids. Concerns around whether superintelligent AI is really as close as Altman claims have already been raised by other tech leaders, including Meta’s AI Chief Yann LeCun, who labeled such hopes “complete B.S.”

Featured image: 

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the tech, gambling and blockchain industries for major developments, new product and brand launches, AI breakthroughs, game releases and other newsworthy events. Editors assign relevant stories to in-house staff writers with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Rachael Davies
Tech Journalist

Rachael Davies has spent six years reporting on tech and entertainment, writing for publications like the Evening Standard, Huffington Post, Dazed, and more. From niche topics like the latest gaming mods to consumer-faced guides on the latest tech, she puts her MA in Convergent Journalism to work, following avenues guided by a variety of interests. As well as writing, she also has experience in editing as the UK Editor of The Mary Sue , as well as speaking on the important of SEO in journalism at the Student Press Association National Conference. You can find her full portfolio over on…

Get the biggest tech headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Tech News

    Explore the latest in tech with our Tech News. We cut through the noise for concise, relevant updates, keeping you informed about the rapidly evolving tech landscape with curated content that separates signal from noise.

    In-Depth Tech Stories

    Explore tech impact in In-Depth Stories. Narrative data journalism offers comprehensive analyses, revealing stories behind data. Understand industry trends for a deeper perspective on tech's intricate relationships with society.

    Expert Reviews

    Empower decisions with Expert Reviews, merging industry expertise and insightful analysis. Delve into tech intricacies, get the best deals, and stay ahead with our trustworthy guide to navigating the ever-changing tech market.