A recent research paper found that an open-source AI system using retrieval augmentation can outperform proprietary chatbot models like OpenAI’s GPT-3.5.

The paper published on Oct. 4 by Nvidia researchers compares different techniques for handling long context in large language models (LLMs) — the key algorithms behind today’s conversational AI. One method is simply extending the context window, allowing the LLM to directly “read” more tokens of text as input and keep it in mind when producing its output. The other approach uses retrieval to provide the LLM with only the most relevant context from a large database.

Their best approach combines both techniques — a 70 billion parameter LLaMA open source model with an extended 32,000 token context window, further augmented by retrieving relevant passages from a corpus. The retriever provides context on demand, rather than the LLM having to store everything, making it more efficient.

On a set of 7 long-form question answering and summarization benchmarks, this hybrid retrieval-augmented LLaMA achieved an average score of 43.6, surpassing GPT-3.5-turbo which allows for 16,000 tokens of context (42.8 average). It matched OpenAI’s massive proprietary 175B parameter Davinci model on a subset of 4 tasks.

The authors argue that retrieval provides significant benefits even when very large LLMs already have extended context windows. They found a 4,000-token LLaMA with retrieval performed similarly to non-retrieval LLaMAs with 16,000 tokens, while being much faster due to less input.

The researchers believe that performance on par with closed commercial systems like ChatGPT can be achieved by combining existing open-source models like LLaMA with retrieval techniques+. The findings suggest that integrating retrieval and long context is a promising direction for building more capable open-source conversational AI.

The paper provides evidence that with the right algorithms, open-source AI can match or surpass proprietary chatbots. The results may shape how the next AI systems integrate models that can handle long text input with extra relevant information and points to retrieval as a key piece alongside context length extension.

Featured Image Credit: Markus Winkler; Pexels; Thank you!

Radek Zielinski

Radek Zielinski is an experienced technology and financial journalist with a passion for cybersecurity and futurology.