Google DeepMind's Genie 2 turns images into playable 3D worlds

Google DeepMind’s Genie 2 turns images into immersive, playable 3D worlds

DeepMind has unveiled Genie 2, a sophisticated AI system that is said to be able to convert single images into immersive 3D environments. The interactive space will let users explore dynamic, “endless” worlds for up to one minute.

Jack Parker-Holder, a research scientist at DeepMind, introduced the groundbreaking foundation world model on Wednesday (Dec. 4). In a post on X, Parker-Holder wrote: “We believe Genie 2 could unlock the next wave of capabilities for embodied agents.”

Introducing 🧞Genie 2 🧞 – our most capable large-scale foundation world model, which can generate a diverse array of consistent worlds, playable for up to a minute. We believe Genie 2 could unlock the next wave of capabilities for embodied agents 🧠. pic.twitter.com/AfL3EbOMeB

— Jack Parker-Holder (@jparkerholder) December 4, 2024

What can Google DeepMind Genie 2 do?

According to the company’s blog post, the system can create fully playable games from a single text prompt (“A humanoid robot in Ancient Egypt”), so users can interact through standard inputs such as a keyboard and mouse, whether controlled by humans or AI.

This is similar to the models currently being developed by Fei-Fei Li’s company, World Labs, and the Israeli startup Decart. Genie 2 builds on the foundation of DeepMind’s Genie, which debuted earlier this year.

Users can perform actions such as jumping and swimming using a mouse or keyboard. Trained on video data, the model can accurately simulate object interactions, animations, lighting effects, physics, reflections, and the behavior of non-player characters (NPCs). The system also manages complex lighting, reflections, and smoke effects.

The company also tested Genie 2 alongside its SIMA AI agent, which responds to natural language commands within digital environments. In one test, SIMA successfully travelled through a room generated by Genie 2, executing instructions like “Open the blue door.”

When we started Genie 1 over two years ago, we always imagined a foundation world model will one day be able to generate an endless curriculum for training embodied AGI. Today, we made a big step towards that future.“this is the worst AI will ever be”

— Tim Rocktäschel (@rockt.ai) 2024-12-04T16:23:02.312Z

Posting on Bluesky, DeepMind researcher Tim Rocktäschel, said: “When we started Genie 1 over two years ago, we always imagined a foundation world model will one day be able to generate an endless curriculum for training embodied AGI. Today, we made a big step towards that future.”

Could Google face any issues?

As ReadWrite has previously reported, Google has previously been accused of allowing OpenAI to harvest text from YouTube for their AI models, and it is unclear whether the same has happened in this case in terms of video game generation.

At the time, they told us: “Both our robots.txt files and Terms of Service prohibit unauthorized scraping or downloading of YouTube content, and we have a long history of employing technical and legal measures to prevent it. We take action when we have a clear legal or technical basis to do so.”

ReadWrite has reached out to Google for comment.

Featured image: Google DeepMind