Skip to main content
Waypoint: The official Waymo blog

Search Blog

Back

The Waymo World Model: A New Frontier For Autonomous Driving Simulation

The Waymo Driver has traveled nearly 200 million fully autonomous miles, becoming a vital part of the urban fabric in major U.S. cities and improving road safety. What riders and local communities don’t see is our Driver navigating billions of miles in virtual worlds, mastering complex scenarios long before it encounters them on public roads. Today, we are excited to introduce the Waymo World Model, a frontier generative model that sets a new bar for large-scale, hyper-realistic autonomous driving simulation. 

Simulation of the Waymo Driver evading a vehicle going in the wrong direction. The simulation initially follows a real event, and seamlessly transitions to using camera and lidar images automatically generated by an efficient real-time Waymo World Model.

Simulation is a critical component of Waymo’s AI ecosystem and one of the three key pillars of our approach to demonstrably safe AI. The Waymo World Model, which we detail below, is the component that is responsible for generating hyper-realistic simulated environments.

The Waymo World Model is built upon Genie 3—Google DeepMind's most advanced general-purpose world model that generates photorealistic and interactive 3D environments—and is adapted for the rigors of the driving domain. By leveraging Genie’s immense world knowledge, it can simulate exceedingly rare events—from a tornado to a casual encounter with an elephant—that are almost impossible to capture at scale in reality. The model’s architecture offers high controllability, allowing our engineers to modify simulations with simple language prompts, driving inputs, and scene layouts. Notably, the Waymo World Model generates high-fidelity, multi-sensor outputs that include both camera and lidar data.

This combination of broad world knowledge, fine-grained controllability, and multi-modal realism enhances Waymo’s ability to safely scale our service across more places and new driving environments. In the following sections we showcase the Waymo World Model in action, featuring simulations of the Waymo Driver navigating diverse rare edge-case scenarios.

🌎 Emergent Multimodal World Knowledge

Most simulation models in the autonomous driving industry are trained from scratch based on only the on-road data they collect. That approach means the system only learns from limited experience. Genie 3’s strong world knowledge, gained from its pre-training on an extremely large and diverse set of videos, allows us to explore situations that were never directly observed by our fleet.

Through our specialized post-training, we are transferring that vast world knowledge from 2D video into 3D lidar outputs unique to Waymo’s hardware suite. While cameras excel at depicting visual details, lidar sensors provide valuable complementary signals like precise depth. The Waymo World Model can generate virtually any scene—from regular, day-to-day driving to rare, long-tail scenarios—across multiple sensor modalities.

🌪️ Extreme weather conditions and natural disasters
💥 Rare and safety-critical events
🐘 Long-tail (pun intended!) objects and more

In the interactive viewers below, you can immersively view the realistic 4D point clouds generated by the Waymo World Model.

Loading…

Interactive 3D visualization of an encounter with an elephant.

Loading…

Interactive 3D visualizations of a drive through a city street.

🕹️ Strong Simulation Controllability

The Waymo World Model offers strong simulation controllability through three main mechanisms: driving action control, scene layout control, and language control.

Driving action control allows us to have a responsive simulator that adheres to specific driving inputs. This enables us to simulate “what if” counterfactual events such as whether the Waymo Driver could have safely driven more confidently instead of yielding in a particular situation.

Counterfactual driving. We demonstrate simulations both under the original route in a past recorded drive, or a completely new route. While purely reconstructive simulation methods (e.g., 3D Gaussian Splats, or 3DGS) suffer from visual breakdowns due to missing observations when the simulated route is too different from the original driving, the fully learned Waymo World Model maintains good realism and consistency thanks to its strong generative capabilities.

Scene layout control allows for customization of the road layouts, traffic signal states, and the behavior of other road users. This way, we can create custom scenarios via selective placement of other road users, or applying custom mutations to road layouts.

Scene layout conditioning following

Language control is our most flexible tool that allows us to adjust time-of-day, weather conditions, or even generate an entirely synthetic scene (such as the long-tail scenarios shown previously).

World Mutation - Time of Day

World Mutation - Weather

🎞️ Converting Dashcam Videos

During a scenic drive, it is common to record videos of the journey on mobile devices or dashcams, perhaps capturing piled up snow banks or a highway at sunset. The Waymo World Model can convert those kinds of videos, or any taken with a regular camera, into a multimodal simulation—showing how the Waymo Driver would see that exact scene. This process enables the highest degree of realism and factuality, since simulations are derived from actual footage.

⚙️ Scalable Inference

Some scenes we want to simulate may take longer to play out, for example, negotiating passage in a narrow lane. That’s harder to do because the longer the simulation, the tougher it is to compute and maintain stable quality. However, through a more efficient variant of the Waymo World Model, we can simulate longer scenes with dramatic reduction in compute while maintaining high realism and fidelity to enable large-scale simulations.

🚀  Long rollout (4x speed playback) on an efficient variant of the Waymo World Model

By simulating the “impossible”, we proactively prepare the Waymo Driver for some of the most rare and complex scenarios. This creates a more rigorous safety benchmark, ensuring the Waymo Driver can navigate long-tail challenges long before it encounters them in the real world.

Acknowledgements


The Waymo World Model is enabled by the key research, engineering and evaluation contributions from James Gunn, Kanaad Parvate, Lu Liu, Lucas Deecke, Luca Bergamini, Zehao Zhu, Raajay Viswanathan, Jiahao Wang, Sakshum Kulshrestha, Titas Anciukevičius, Luna Yue Huang, Yury Bychenkov, Yijing Bai, Yichen Shen, Stefanos Nikolaidis, Tiancheng Ge, Shih-Yang Su and Vincent Casser.

We thank Chulong Chen, Mingxing Tan, Tom Walters, Harish Chandran, David Wong, Jieying Chen, Smitha Shyam, Vincent Vanhoucke and Drago Anguelov for their support in defining the vision for this project, and for their strong leadership and guidance throughout.

We would like to additionally thank Jon Pedersen, Michael Dreibelbis, Larry Lansing, Sasho Gabrovski, Alan Kimball, Dave Richardson, Evan Birenbaum, Harrison McKenzie Chapter and Pratyush Chakraborty, Khoa Vo, Todd Hester, Yuliang Zou, Artur Filipowicz, Sophie Wang and Linn Bieske for their invaluable partnership in facilitating and enabling this project.

We thank our partners from Google DeepMind: Jack Parker-Holder, Shlomi Fruchter, Philip Ball, Ruiqi Gao, Songyou Peng, Ben Poole, Fei Xia, Allan Zhou, Sean Kirmani, Christos Kaplanis, Matt McGill, Tim Salimans, Ruben Villegas, Xinchen Yan, Emma Wang, Woohyun Han, Shan Han, Rundi Wu, Shuang Li, Philipp Henzler, Yulia Rubanova, and Thomas Kipf for helpful discussions and for sharing invaluable insights for this project.