I started my Project Genie session by asking for something childish and specific: a claymation castle in the clouds made of marshmallows, with a chocolate river and candy trees. A few seconds later I was trundling through a pastel, puffy turreted fantasy that looked like a stop‑motion set someone had built out of confectionery. It was silly, delightful, and the kind of output only a system trained to imagine entire environments could produce.
Project Genie is Google DeepMind’s public-facing experiment that turns text prompts (or photos) into explorable, interactive worlds. It stitches together three pieces of Google’s stack — Genie 3 (the world model), Nano Banana Pro (image sketching), and Gemini — and is available today to U.S. Google AI Ultra subscribers. DeepMind pitches this as both entertainment and research: a way to get more people trying world models while collecting feedback on real-world behavior and limitations.
How it works
You begin with a “world sketch”: describe an environment and a character, or upload a photo. Nano Banana Pro renders a preview image you can tweak, then Genie 3 generates the live environment you can walk or fly through in first- or third-person. Worlds are produced auto‑regressively — the model renders frame-by-frame as you move, recalling previously generated details so the scene stays mostly consistent for up to about a minute.
Google caps sessions at 60 seconds for now. That’s not an arbitrary UX choice: Genie 3’s architecture is compute‑heavy and the company dedicates hardware to each session. DeepMind says the limit helps gauge broader usage while balancing costs.
Technically, Genie 3 claims real‑time interactivity at roughly 20–24 fps and 720p output. It also supports “promptable world events” — commands that alter the environment mid‑exploration — though some advanced capabilities from the research preview aren’t in this prototype.
Where it shines — and where it trips
Artistic, stylized prompts are Genie’s sweet spot. Claymation, watercolor, toy‑scale dioramas and whimsical fantasy environments tend to look terrific; that marshmallow castle was proof. In these modes, objects feel tactile, characters bump into furniture, and the world has a charming internal logic.
But when you ask for photorealism or cinematic precision, the model often falls back toward a game-like aesthetic. Attempts to reconstruct a real office from a photo resulted in familiar objects laid out oddly and rendered with a sterile digital look rather than a lived-in, photoreal scene. Controls can be frustrating: navigation using WASD, arrow keys and spacebar sometimes feels laggy or imprecise, and characters occasionally slide through walls or fail to interact consistently with the environment.
Safety and IP guardrails are also strict. DeepMind has already blocked generation of nudity and content explicitly tied to certain copyrighted properties — a reaction in part to previous disputes over character generation. That means you can’t reliably ask Genie for Disney-esque mermaids or recognizable franchise heroes, and the system will refuse or sanitize such requests.
Not a game engine (yet)
Project Genie produces explorable, physically simulated spaces, but it’s not a traditional game engine. There aren’t built‑in mechanics, persistent multiplayer, or long play sessions — the prototype is about creative exploration and research. DeepMind envisions later uses beyond entertainment: researchers might use consistent, controllable simulations to train embodied agents or prototype scenarios for robotics and autonomous systems. That ambition ties into broader work on agentic AI and search capabilities — for context on Gemini’s growing role across Google products, see the recent integration of Gemini into Deep Research tools here. And in time, agentic features (like booking or tasking) that Google is testing elsewhere hint at how interactive AI functions may evolve here.
The landscape and competition
Genie 3 isn’t the only world‑model effort. Startups and labs such as World Labs (Marble), Runway, and new entrants focused on world models are pushing similar territory. What sets DeepMind’s approach apart is its emphasis on interactivity and temporal consistency — the promise that a generated environment can remember and react as you move through it.
That promise still has kinks to iron out. Memory tends to hold reliably for short horizons (seconds to a minute), text rendering is hit-or-miss, and simulating rich interactions between multiple independent agents remains an ongoing research challenge.
Why this matters
If world models reach the robustness DeepMind describes, they could reshape how we prototype games, teach history through immersive simulations, or train robots in realistic virtual conditions. Even now, the novelty of letting non‑experts sketch a scene and immediately walk it is revealing creative use cases DeepMind hadn’t anticipated — everything from filmmakers visualizing shots to hobbyists generating playful, Nintendo‑adjacent levels (another reviewer confessed to building clumsy knockoffs of 3D platformers).
For would‑be players who like to stream handmade levels or recreate childlike fantasies, Project Genie is an early, flashy glimpse. It’s not a finished product, but it’s an honest research prototype: impressive in places, inconsistent in others, and clearly designed to learn from public use.
I logged out of that marshmallow castle with a smile — and a reminder that we’re a long way from truly seamless, endlessly persistent synthetic worlds. For now, Genie’s best trick is how fast it turns imagination into something you can step into, even if the stepping sometimes bumps you through a wall.
If you’d rather explore these sorts of generated scenes on a living-room console, many readers still pair gaming experiments with powerful hardware like the PlayStation 5 Pro.