Apple ReALM vs OpenAI GPT 4: Apple Claims it can “Substantially outperform.”

Apple has made claims that its large language model (LLM) Reference Resolution As Language Model (ReALM) can ‘substantially outperform’ OpenAI’s GPT-4 in its Research Paper on Apple's AI plans for Siri, published last week.

Regarding ReALM's performance, the researchers compared it against OpenAI's LLMs GPT-3.5 and GPT-4 which power the free ChatGPT and the paid ChatGPT Plus. In the paper, the researchers said, "We demonstrate large improvements over an existing system with similar functionality across different types of references, with our smallest model obtaining absolute gains of over 5% for onscreen references. We also benchmark against GPT-3.5 and GPT-4, with our smallest model achieving performance comparable to that of GPT-4, and our larger models substantially outperforming it.”

What is Reference Resolution ?

A conversational AI system called Reference Resolution As Language Modeling (ReALM), which aims to enhance reference resolution, was introduced in the paper. The linguistic challenge of figuring out what a particular expression is referencing is known as reference resolution.

Apple highlights how crucial it is for chatbots to comprehend references correctly, especially when users use terms like "that" or "it" to refer to items on a screen. ChatGPT and other AI chatbots might have trouble accurately comprehending the reference. 

apple realm

In their paper, Apple's researchers outlined Apple's approach to reference resolution that treats it as a language modelling problem, allowing ReaLM to convert conversational, on-screen, and background processes into a text format for processing by large language models (LLMs). On-screen entities refer to items visible on the user's screen, while conversational entities pertain to information relevant to the ongoing conversation, such as previous interactions or context. Background entities encompass elements not directly displayed on the screen but still relevant, such as background processes like playing music or receiving notifications.

ReALM-80M, ReALM-250M, ReALM-1B, and ReALM-3B are the four ReALM model configurations that are presented in this paper. The letters "M" and "B" stand for millions and billions of parameters, respectively. Comparatively speaking, GPT-3.5 has 175 billion parameters, while GPT-4 is said to have about 1.5 trillion.

Comparison with GPT-4

According to Apple researchers, GPT-3.5 only accepts text input, whereas their input consists of the prompt alone. On the other hand, in the case of GPT-4, which is also capable of contextualising images, giving the system a screenshot for the task of on-screen reference resolution significantly improved performance.

gpt4

The paper mentioned, “Note that our ChatGPT prompt and prompt+image formulation are, to the best of our knowledge, in and of themselves novel. While we believe it might be possible to further improve results, for example, by sampling semantically similar utterances up until we hit the prompt length, this more complex approach deserves further, dedicated exploration, and we leave this to future work.”

ReALM outperformed GPT in a benchmark that it was created expressly to excel at. Hence, it would be inaccurate to claim that the former is a superior model. 

Additionally, Apple has scheduled its annual WWDC conference for June 10. Apple's senior vice president of global marketing Greg Joswiak announces that AI may be the main topic of discussion at the conference.

(Inputs from Agencies)

©️ Copyright 2024. All Rights Reserved Powered by Vygr Media.