New Insights for Scaling Laws in Autonomous Driving

Many recent AI breakthroughs have followed a common pattern: bigger models, trained on more data, with more compute, often deliver extraordinary gains. Waymo’s latest study explores whether this trend extends to autonomous driving and establishes new scaling laws in motion planning and forecasting — core autonomous vehicle (AV) capabilities.

Our research confirms that, similar to language modeling, increased data and compute resources can enhance the performance of autonomous vehicles. These insights benefit not only the Waymo Driver but also have broader applications in embodied AI research, including robotics.

Scaling Laws of Motion Forecasting and Planning

The last few years of AI performance have been powered by scale. It has been repeatedly shown that the performance of deep learning models scales predictably as we increase model size, dataset size, and training compute. These scaling laws drive continuous advancements in large language models (LLM) in particular, as evidenced by the increasingly capable AI systems we see emerging regularly.

But what about autonomous vehicles?

As the Waymo Driver generalizes across different types of environment and scenarios—whether it’s navigating through dense urban traffic, merging onto high-speed freeways, yielding to emergency vehicles, or reacting to a red light runner—motion forecasting requires building robust models that account for the myriad of edge cases that can happen on public roads. This is a highly complex task given the inherent uncertainty in predicting the behavior of other road users, the intricate interactions between them, and the need to reason about the long-term consequences of actions in real time.

For all of these reasons, it has traditionally been difficult to tell if scaling laws—and the efficiency and predictability they have brought to building the first wave of LLMs—can be applicable in the case of motion forecasting and planning.

Waymo’s research findings

To examine the relationship between motion forecasting and greater scale, we conducted a comprehensive study using Waymo’s internal dataset. Spanning 500,000 hours of driving, it is significantly larger than any dataset used in previous scaling studies in the AV domain.

Our study uncovered the following:

Similar to LLMs, motion forecasting quality also follows a power-law as a function of training compute.
Data scaling is critical for improving the model performance.
Scaling inference compute also improves the model's ability to handle more challenging driving scenarios.
Closed-loop performance follows a similar scaling trend. This suggests, for the first time, that real-world AV performance can be improved by increasing training data and compute.

Scaling Charts — Model performance predictably improves as a function of the training compute budget. This predictable improvement not only applies to the objective the model is trained with (Left), but also to popular motion forecasting open-loop metrics (Middle), and most importantly, to planning performance in closed-loop simulation (Right).

Implications for autonomous driving and beyond

These findings have exciting implications for the development of autonomous vehicles as they generalize to an increasingly wide range of scenarios and environments.

Through these insights, researchers and developers of AV models can begin to know with certainty that enriching the quality and size of the data and models will deliver better performance. Being able to predictably scale these models places us on a path to continually improve our understanding of the diverse and complex behaviors that AVs encounter daily.

This ranges from improving the accuracy of trajectory predictions on fixed datasets and how well they perform in real-world driving scenarios, to deepening the sophistication of our behavior recognition capabilities. These advancements hold the potential to further enhance the safety of AVs.

Scaling Laws Example — A more realistic coverage of possible futures emerges when scaling up the model. A very small model (~ 1M parameters) is shown on the left, and a larger model (~ 30M parameters) on the right. The yellow box indicates the vehicle, whose path is being predicted. Blue/green lines represent predicted trajectories, yellow dots - ground truth trajectory.

Our findings are translatable to similar robotic planning tasks where researchers can now have a clearer sense of the data they need to collect and sizes of models that they should be training. Our research opens up the possibility to devise more adaptive training strategies for planning tasks in robotics, such as adapting the compute needed to solve more complex scenarios.

At Waymo, we're constantly pushing the boundaries of multimodal foundation models that could influence the trajectory of broader AI research. If you're passionate about contributing to this exciting field, we invite you to explore career opportunities with us and help shape the future of autonomous driving and AI.