Home AI chatbots could converse all day without crashing, new research finds

AI chatbots could converse all day without crashing, new research finds

Researchers at MIT have found a solution to the problem of AI chatbots’ deteriorating conversations, enabling them to maintain nonstop conversations without crashing or slowing down.

When users continuously converse with chatbots like ChatGPT, the large language models powering the technology begin to collapse, leading to communication issues. At times, they can even hallucinate facts.

However, some researchers have identified the root cause and discovered a way to allow conversations to flow without the need to restart the software.

Their approach modifies the key-value cache, essentially the conversation memory central to many large language models. In specific methods, when the cache exceeds its capacity, it ejects the earliest data entries, which can lead to the model’s failure. However, by preserving these initial data points in its memory, they were able to push the chatbot to keep engaging without any significant issues.

By using a technique known as StreamingLLM, the researchers were able to ensure the model stayed efficient even during conversations that extended beyond four million words. Compared to another approach that prevents crashes by frequently re-evaluating portions of previous conversations, StreamingLLM proved to be over 22 times quicker.

As a result, this could help chatbots sustain lengthy conversations without the need for constant reboots, which means that the AI assistants are far more effective for activities such as copywriting, editing, or code generation.

Why are AI chatbots crashing?

Large language models transform user queries into token representations, using an attention mechanism to generate new text by assessing how these tokens relate to each other within an “attention map.”

This process, crucial for producing human-like text, relies on storing recent tokens in a ‘KV Cache.’ However, the cache’s capacity limitations and the subsequent massive size of the attention map can slow down computations and degrade performance when the cache overflows, as seen when encoding complex documents like academic papers.

Researchers have attempted to address these issues with a “sliding cache” strategy, which replaces the oldest tokens with new ones, though this often results in a significant drop in text quality as soon as tokens are removed.

A new approach detailed in the paper suggests keeping the first token in the cache to maintain model performance, even when the cache limit is surpassed. This counterintuitive strategy is effective despite the seemingly unrelated nature of the first and last words in extensive texts and books, leading to discoveries about the underlying reasons for this phenomenon. It offers insights into improving large language model efficiency.

The lead author of the StreamingLLM paper, graduate student Guangxuan Xiao, said, “Now, with this method, we can persistently deploy these large language models. We could use these chatbots in some new applications by making a chatbot that we can always chat with and that can always respond to us based on our recent conversations.”

Among the co-authors included electrical engineering and computer science associate professor Song Han, who is also a member of the MIT-IBM Watson AI Lab and a distinguished scientist of NVIDIA, Meta AI research scientists Yuandong Tian and Mike Lewis, as well as Carnegie Mellon University assistant professor Beidi Chen.

The first token

The researchers say the first token is called an “attention sink” in the process.

Han added: “We need an attention sink, and the model decides to use the first token as the attention sink because it is globally visible — every other token can see it. We found that we must always keep the attention sink in the cache to maintain the model dynamics.”

During the development of StreamingLLM, researchers found that positioning four attention sink tokens at the start of the sliding cache achieves the best performance.

Despite the success, the model cannot remember words not stored in the cache. However, the researchers plan to target this limitation by investigating methods to retrieve tokens that have been removed or enable the model to memorize previous conversations.

Featured image: Canva

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the tech industry for major developments, new product launches, AI breakthroughs, video game releases and other newsworthy events. Editors assign relevant stories to staff writers or freelance contributors with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Suswati Basu
Tech journalist

Suswati Basu is a multilingual, award-winning editor and the founder of the intersectional literature channel, How To Be Books. She was shortlisted for the Guardian Mary Stott Prize and longlisted for the Guardian International Development Journalism Award. With 18 years of experience in the media industry, Suswati has held significant roles such as head of audience and deputy editor for NationalWorld news, digital editor for Channel 4 News and ITV News. She has also contributed to the Guardian and received training at the BBC As an audience, trends, and SEO specialist, she has participated in panel events alongside Google. Her…

Get the biggest tech headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Tech News

    Explore the latest in tech with our Tech News. We cut through the noise for concise, relevant updates, keeping you informed about the rapidly evolving tech landscape with curated content that separates signal from noise.

    In-Depth Tech Stories

    Explore tech impact in In-Depth Stories. Narrative data journalism offers comprehensive analyses, revealing stories behind data. Understand industry trends for a deeper perspective on tech's intricate relationships with society.

    Expert Reviews

    Empower decisions with Expert Reviews, merging industry expertise and insightful analysis. Delve into tech intricacies, get the best deals, and stay ahead with our trustworthy guide to navigating the ever-changing tech market.