Home Elon Musk’s xAI previews Grok-1.5V, its first multimodal model

Elon Musk’s xAI previews Grok-1.5V, its first multimodal model


  • Elon Musk's xAI unveils Grok-1.5V, its first-generation multimodal model.
  • Grok-1.5V boasts enhanced reasoning abilities and a context length of 128,000 tokens.
  • It can understand documents, translate code, process real-world scenarios, and utilize long context understanding.

Elon Musk’s xAI has officially introduced its first-generation multimodal model that can understand documents, translate code, and process real-world situations.

The tool, named Grok -1.5V, is said to have ‘strong text capabilities’ and will soon be available to early testers and existing Grok users.

The update comes just a week after the open release of Grok-1 which concluded its pre-training phase in October 2023.

“Grok-1.5 comes with improved reasoning capabilities and a context length of 128,000 tokens,” the company said in a blog post on the xAI website.

This long context understanding is a new feature that will allow Grok to have an increased memory capacity of up to 16 times the previous context length. This means it’ll be able to utilize information from longer documents, along with more complex prompts.

The model will still work in an instruction-following capacity but will now be able to understand documents, science diagrams, charts, screenshots, and photographs. It can also translate diagrams into Python code.

Grok-1.5V can understand the real world 

“In order to develop useful real-world AI assistants, it is crucial to advance a model’s understanding of the physical world. Towards this goal, we are introducing a new benchmark, RealWorldQA,” said the team behind Grok-1.5V.

The benchmark will be used to evaluate the real-world spatial understanding capabilities of multimodal models. The team has provided some examples including asking Grok which way can a car turn and which object is the largest in a flat-lay photo.

The initial release of the benchmark includes more than 700 photos, all with a question or easily verifiable answer.

Looking into the future, the team described the need to upgrade multimodal models: “Advancing both our multimodal understanding and generation capabilities are important steps in building beneficial AGI that can understand the universe.

“In the coming months, we anticipate to make significant improvements in both capabilities, across various modalities such as images, audio, and video.”

Featured Image: Via Ideogram

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the tech industry for major developments, new product launches, AI breakthroughs, video game releases and other newsworthy events. Editors assign relevant stories to staff writers or freelance contributors with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Sophie Atkinson
Tech Journalist

Sophie Atkinson is a UK-based journalist and content writer, as well as a founder of a content agency which focuses on storytelling through social media marketing. She kicked off her career with a Print Futures Award which champions young talent working in print, paper and publishing. Heading straight into a regional newsroom, after graduating with a BA (Hons) degree in Journalism, Sophie started by working for Reach PLC. Now, with five years experience in journalism and many more in content marketing, Sophie works as a freelance writer and marketer. Her areas of specialty span a wide range, including technology, business,…

Get the biggest tech headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Tech News

    Explore the latest in tech with our Tech News. We cut through the noise for concise, relevant updates, keeping you informed about the rapidly evolving tech landscape with curated content that separates signal from noise.

    In-Depth Tech Stories

    Explore tech impact in In-Depth Stories. Narrative data journalism offers comprehensive analyses, revealing stories behind data. Understand industry trends for a deeper perspective on tech's intricate relationships with society.

    Expert Reviews

    Empower decisions with Expert Reviews, merging industry expertise and insightful analysis. Delve into tech intricacies, get the best deals, and stay ahead with our trustworthy guide to navigating the ever-changing tech market.