Home Multimodal AI become accessible: new model runs on your laptop

Multimodal AI become accessible: new model runs on your laptop

A new open-source artificial intelligence model named Obsidian, announced in an Oct. 30 Reddit post, represents a breakthrough in multimodal AI accessibility. Obsidian is the first 3b parameter multimodal AI — which makes it a model compact enough to run efficiently on a regular laptop.

Multimodal AI refers to AI systems that can process and connect data from different modes, such as text, images, audio, and video — in this case, the model accepts text and pictures as input, much like the latest version of OpenAI’s GPT-4V. While multimodal AI models like DALL-E 3 and GPT-4 have shown impressive capabilities, their enormous size makes them resource-intensive to run, requiring expensive high-end hardware — and their models are a closely guarded secret, so you could never run them even if you had the necessary specialized hardware.

The AI intelligence model, Obsidian, packs multimodal intelligence into a standard laptop’s memory

Obsidian changes this by packing multimodal intelligence into a model small enough to fit into a standard laptop’s memory and run at practical speeds. At 3 billion parameters, Obsidian builds upon the Capybara-3B model architecture, which achieves state-of-the-art performance compared to similarly sized models. The developer also announced on Reddit that a multimodal model based on the highly-praised Mistral open-source 7B model will soon follow.

Obsidian’s compact size is thanks to techniques adapted from the LLaMA model architecture. According to the Reddit post announcing Obsidian, it was pre-trained on a diverse synthesized multi-modal dataset, including text paired with corresponding images. This training methodology allowed it to develop strong language and vision capabilities despite its reduced parameters.

The result is an AI assistant with conversational skills and visual understanding that can fit in your backpack. Obsidian breaks down barriers to accessing AI, opening up new possibilities for on-device intelligence.

While still an early version, Obsidian’s efficient form factor sets an exciting precedent. It demonstrates that multimodal AI does not have to be locked up in giant data centers but can be made compact enough to be distributed widely.

Featured Image Credit: From Image Creation at Aimesoft; Thank you!

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the tech industry for major developments, new product launches, AI breakthroughs, video game releases and other newsworthy events. Editors assign relevant stories to staff writers or freelance contributors with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Radek Zielinski
Tech Journalist

Radek Zielinski is an experienced technology and financial journalist with a passion for cybersecurity and futurology.

Get the biggest tech headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Tech News

    Explore the latest in tech with our Tech News. We cut through the noise for concise, relevant updates, keeping you informed about the rapidly evolving tech landscape with curated content that separates signal from noise.

    In-Depth Tech Stories

    Explore tech impact in In-Depth Stories. Narrative data journalism offers comprehensive analyses, revealing stories behind data. Understand industry trends for a deeper perspective on tech's intricate relationships with society.

    Expert Reviews

    Empower decisions with Expert Reviews, merging industry expertise and insightful analysis. Delve into tech intricacies, get the best deals, and stay ahead with our trustworthy guide to navigating the ever-changing tech market.