Researchers have found that Apple’s new AI system, ReALM, surpassed the capabilities of OpenAI’s GPT-4.
The paper titled “ReALM: Reference Resolution as Language Modelling” examines the issue of reference resolution. Reference is a linguistic process in which one word in a sentence or discourse refers to another word or entity. The task of resolving these references is known as Reference Resolution.
Apple says its latest AI model ReALM is even “better than OpenAI’s GPT4”.
It likely is as GPT4 has regressed because of “alignment”.
The ReALM war begins at WWDC 2024.
Paper: https://t.co/3emVSjgRvK pic.twitter.com/tOPMVaVI9V
— Brian Roemmele (@BrianRoemmele) April 1, 2024
The researchers state that while large language models (LLMs) are extremely powerful for a variety of tasks, their use in reference resolution, particularly for non-conversational entities, remains underutilized.
According to the study, the smallest version of ReALM was benchmarked against GPT-3.5 and GPT-4, and it managed to achieve performance comparable to that of GPT-4, while the larger models substantially outperformed it.
Ahead of WWDC 2024 and the anticipated June launch of iOS 18, expectations are high for the debut of an advanced Siri 2.0. Whether ReALM will be integrated into Siri by then remains uncertain.
Apple’s recent ventures into AI have not gone unnoticed, marked by the introduction of new models and tools aimed at enhancing AI efficiency on smaller devices, as well as strategic partnerships. These developments highlight the company’s strategy to place AI at the forefront of its business operations.
The unveiling of ReALM represents Apple’s AI research team’s latest and most targeted initiative to refine and accelerate existing models, driving them toward greater speed, intelligence, and efficiency.
Key features of Apple’s ReALM AI
ReALM reportedly uses a new way of converting screen information into text, allowing it to bypass the need for image recognition parameters and enabling more efficient processing on AI devices.
It also takes into account what is on the user’s screen or those running in the background.
As a result, the LLM should enable users to scroll through a website and instruct Siri to call a business. Siri would then be able to ‘see’ the phone number on the website and directly make the call.
Hence ReALM could significantly improve the context-aware capabilities of voice assistants. With its ability to interpret on-screen information and use additional context, the update to Siri could help deliver a more fluid and hands-free user experience.
ReALM could also handle a wide variety of references, including those that are dependent on conversational context, on-screen content, and even background information. This is critical for developing more intuitive and responsive AI systems that can adapt to the complexities of human language and context.
The paper reports large improvements over existing systems with similar functionalities, as its smallest model apparently achieved absolute gains of over 5% for on-screen references.
Featured image: Canva