Occupancy and Flow Prediction

Overview
Figure 1. Left: Sample occupancy predictions. Red pixels show occupancy predictions for vehicles. The black box shows ground-truth future occupancy of the ego vehicle. Right: Sample flow predictions. Gray boxes show agent occupancies over the input interval (history + current). Colored pixels show predicted flow direction (hue) and magnitude (saturation) at each location in the grid.
Note: The tutorial notebook demonstrates all the items discussed on this web page, including construction of ground-truth, training a baseline model with sample losses, computing the metrics, and generating submission files over the test dataset.
Note: The output representation for the occupancy flow tasks is very similar to other well-studied tasks like semantic segmentation and video prediction. So, it is possible to adopt models designed for these tasks to solve the occupancy flow challenge.
This challenge consists of three connected tasks, defined for the Waymo Open Motion Dataset. All tasks use the same inputs: A one-second history of a number of agents in a scene. The objective for the tasks is to predict future occupancies and flow (motion) of only vehicles over 8 seconds into the future. All predictions are dense grids in bird's-eye view (BEV). Note that for this challenge the predictions should include all vehicles, ignoring the value of objects_of_interest and tracks_to_predict in the dataset protos. Our sample code demonstrates how to construct the ground truth for all tasks.
The WOD motion dataset contains agent positions (and other attributes) at 10 Hz. Each example in the training and validation sets contains 1 second of history data, 1 timestep for the current time, and 8 seconds of future data. This corresponds to 10 history timesteps, 1 current timestep, and 80 future timesteps, for a total of 91 timesteps per scene. The 91 timesteps in each example can be organized as:
{t - 10, t - 9, …, t - 1, t, t + 1, …, t + 79, t + 80}
where t indicates the current timestep. Note that agents are not necessarily present in all timesteps. For example, an agent may get occluded in the future, or it may be currently occluded and only appear in future data. Presence and absence of agents at particular timesteps is indicated by the valid data attributes.
The test set hides the ground truth future data. Each example in the test set contains a total of 11 timesteps (10 history and 1 current timesteps).
Waypoints
When constructing the ground-truth for the occupancy flow challenge, we divide the 80 future timesteps into 8 one-second intervals, each containing 10 timesteps. For example, the first interval consists of the first 10 future timesteps {t + 1, t + 2, …, t + 10} and the last waypoint consists of the last 10 future timesteps {t + 71, t + 72, …, t + 80} in every scenario.
For all tasks in this challenge the objective is to predict 8 disjoint BEV grids corresponding to the 8 intervals. The prediction targets are taken from the last timestep in each interval, that is, timesteps {t + 10, t + 20, …, t + 80}. We refer to each of these 8 timesteps as a waypoint. In this document and in the codebase, we identify these waypoints using the index variable \(k \in \{0, \dots, 7\}\). Ground-truth flow fields are constructed from the displacements between the waypoints, that is {t ⇒ t + 10, t + 10 ⇒ t + 20, …, t + 70 ⇒ t + 80}.
The released code also supports constructing cumulative waypoints, which aggregate occupancy and flow using all the 10 timesteps in each one-second interval. This setting might be preferred in some planning applications, but for this challenge the submissions should predict the individual sub-sampled waypoints (cumulative_waypoints = false).
The released code contains the complete implementation for constructing the ground-truth for all 3 tasks using TensorFlow operations. If you prefer to use other frameworks, we suggest running the code to precompute and store ground-truth data as numpy arrays.
The following sections describe how agents are filtered based on current visibility and agent type (vehicle/pedestrian/cyclist) to get included and rendered in each ground-truth waypoint grid.
Task 1
Future Occupancy of Currently-Observed Vehicles
Given histories of all agents over input timesteps {t - 10, t - 9, ..., t}, predict future occupancy of all vehicles that are present at the current timestep t, for 8 seconds into the future. More specifically, the predictions are 8 occupancy grids \(O^b_k, k \in \{0, \dots, 7\}\), capturing future occupancy of all currently-visible vehicles at 8 different waypoints. Each occupancy grid \(O^b_k\) is a 256 x 256 x 1 array containing values in range [0, 1] indicating the probability that some part of some currently-observed vehicle will occupy that grid cell.
Our sample code demonstrates how the ground truth occupancy grids are constructed for each waypoint. In the ground-truth occupancy grids, each grid cell is set to either 0 or 1.
Task 2
Figure 2. Future occupancy grids for currently-observed agents (left) and currently-occluded agents (right). Red and green pixels show ground-truth future occupancies for vehicles and pedestrians. The black box shows ground-truth future occupancy of the ego vehicle.
Future Occupancy of Currently-Occluded Vehicles
Given histories of all agents over input timesteps {t - 10, t - 9, ..., t}, predict future occupancy of all vehicles that are not present at the current timestep t, for 8 seconds into the future. More specifically, the predictions are 8 occupancy grids \(O^b_k, k \in \{0, \dots, 7\}\), capturing future occupancy of all currently-occluded vehicles at 8 different waypoints. Each occupancy grid \(O^b_k\) is a 256 x 256 x 1 array containing values in range [0, 1] indicating the probability that some part of some currently-occluded vehicle will occupy that grid cell.
Note: an occluded vehicle appears in the ground-truth occupancy grids only at timesteps when it has become disoccluded and observed by the ego vehicle (and not earlier).
Task 3
Future Flow of All Vehicles
Given histories of all agents over input timesteps {t - 10, t - 9, ..., t}, predict future flow of all vehicles (currently observed or occluded), for 8 seconds into the future. More specifically, the predictions are 8 flow fields \(F_k, k \in \{0, \dots, 7\}\), capturing future flow of all vehicles at 8 different waypoints. Each flow field \(F_k\) is a 256 x 256 x 2 array containing (dx, dy) values indicating the displacement over 1 second of the vehicle part that occupies that grid cell.
In ground truth, flow is constructed between timesteps which are 1 second (10 steps) apart. For example, flow for the last timestep t + 80 is constructed by comparing timestep t + 80 with timestep t + 70. Every occupied grid cell at timestep t + 80 stores a (dx, dy) vector pointing to its earlier location at timestep t + 70. Note that the flow vectors point back in time. The ground-truth flow vectors are set to (0, 0) for any grid cells that are not occupied at timestep t + 80. Note that in effect \(F_k\) captures displacements between two successive occupancy waypoints \(O_{k-1}\) and \(O_k\).
Flow metrics are measured only over the occupied grid cells. In other words, grid cells whose ground-truth flow is (0, 0) are excluded from the metrics. Therefore, you are free to predict any value for the unoccupied cells. We recommend excluding the unoccupied cells from your flow loss function as well. These settings decouple the flow prediction and occupancy prediction tasks, and encourage the model to capture the general motion patterns of the scene in the predicted flow fields. Requiring the model to predict (0, 0) flow vectors for the unoccupied cells may lead to conservative predictions where the model predicts smaller flow vectors as an average between a true motion vector and a (0, 0) vector in areas which are not certain to be occupied.
Metrics
Occupancy metrics
The occupancy metrics are used for tasks 1 and 2. They compare ground-truth and predicted occupancy grids for each of the 8 waypoints.
Let \(O^b_k\), \(\hat{O}^b_k\) denote the ground-truth and predicted future occupancy at waypoint \(k\) of currently-observed vehicles. Let \(O^c_k\), \(\hat{O}^c_k\) denote the ground-truth and predicted future occupancy at waypoint \(k\) of currently-occluded vehicles.
Treating the occupancy of each grid cell as a separate binary prediction, the \(\text{AUC}\) metric uses a linearly-spaced set of thresholds in [0, 1] to compute pairs of precision and recall values and estimate the area under the PR-curve. More specifically, we compute \(\text{AUC}(O^b_k, \hat{O}^b_k)\) for currently-observed vehicles and \(\text{AUC}(O^c_k, \hat{O}^c_k)\) for currently-occluded vehicles.
The \(\text{Soft-IoU}\) metric measures the soft intersection-over-union between ground-truth and predicted occupancy grids as:\begin{equation} \text{Soft-IoU}(O^b_k, \hat{O}^b_k) = \frac{\sum_{x,y} O^b_k \hat{O}^b_k}{\sum_{x,y} O^b_k + \hat{O}^b_k - O^b_k \hat{O}^b_k} \end{equation}If \(O^b_k\) is empty, the \(\text{Soft-IoU}\) metric is set to zero. Similarly, we compute \(\text{Soft-IoU}(O^c_k, \hat{O}^c_k)\) for currently-occluded vehicles.All metrics are averaged over the 8 predicted waypoints.
Flow Metric
The End-Point Error (\(\text{EPE}\)) metric measures the mean L2 distance between the ground-truth flow field \(F_k\) and predicted flow field \(\hat{F}_k\) as:
\begin{equation} \Big\lVert F_k(x, y) - \hat{F}_k(x, y)\Big\rVert_2 \text{ where } F_k(x, y) \neq (0, 0). \end{equation}
Joint Occupancy and Flow Metrics
The joint metrics measure the joint accuracy of occupancy and flow predictions at each waypoint \(k\). Given three predictions \(\hat{O}^b_k\), \(\hat{O}^c_k\), \(\hat{F}_k\) for waypoint \(k\), we compute the Flow-Grounded Occupancy metrics as follows:
First, we compute the ground-truth occupancy of all vehicles (currently observed or occluded) at waypoint \(k\) as:
\begin{equation} O_k = O^b_k + O^c_k \end{equation}
and at waypoint \(k-1\) as
\begin{equation} O_{k-1} = O^b_{k-1} + O^c_{k-1}. \end{equation}
We also compute the predicted occupancy of all vehicles as
\begin{equation} \hat{O}_k = \hat{O}^b_k + \hat{O}^c_k. \end{equation}
If the predicted occupancies are accurate, we should have \(\hat{O}_k = O_k\). The occupancy metrics defined above already evaluate this expectation. To ensure correctness of the predicted flow field, \(\hat{F}_k\), we use it to warp the ground-truth origin occupancy of that flow field (\(O_{k-1}\)) as
\begin{equation} \hat{W}_k = \hat{F}_k \circ O_{k-1} \end{equation}
where \(\circ\) indicates function application -- applying the flow field as a function to transform the occupancy. If the predicted flow is accurate, it should be able to reach and cover the future occupancy \(O_k\). Note that since we predict backward flow fields, \(\hat{W}_k\) may predict expansion of occupancy in different directions and reach a larger area beyond \(O_k\). Therefore, we multiply \(\hat{W}_k\) element-wise with \(\hat{O}_k\), to get
\begin{equation} \hat{W}_k \hat{O}_k. \end{equation}
If the predicted occupancy and flow at waypoint \(k\) are accurate, this term should be equal to the ground-truth \(O_k\). In other words, for a grid cell to be marked as occupied in \(\hat{W}_k \hat{O}_k\), it should be supported by both occupancy and flow predictions. Therefore, the flow-grounded occupancy metrics compute \(\text{AUC}\) and \(\text{Soft-IoU}\) between \(\hat{W}_k \hat{O}_k\) and ground-truth \(O_k\) as \(\text{AUC}(O_k, \hat{W}_k \hat{O}_k)\) and \(\text{Soft-IoU}(O_k, \hat{W}_k \hat{O}_k)\).
All metrics are averaged over the 8 predicted waypoints.
Leaderboard
This leaderboard only displays submissions made on or after March 18, 2024, when the 2024 Waymo Open Dataset Challenges started.
Past leaderboards and challenges are available here.
Note: the rankings displayed on this leaderboard may not accurately reflect the final rankings for this Challenge.
Submit
Since occupancy and flow predictions are relatively large data structures, we have selected about 10% of the validation and test scenarios in the motion dataset as the effective validation and test sets for the occupancy and flow challenge. You can find the list of validation scenario ids and test scenario ids on the dataset. Our tutorial notebook contains code for creating submission protos for the effective test set. The sample code creates one binary ChallengeSubmission proto for each shard. Once you have the 150 files, compress them as a single .tar.gz archive and upload the single file using the form below.
To be eligible to participate in the challenge, each individual/all team members must read and agree to be bound by the Official Challenge Rules.
You can only submit against the Test Set 3 times every 30 days. (Submissions that error out do not count against this total.)
Note: Please make sure you have cumulative_waypoints = false in the task config when training your models. This flag was set to true in the initial release of the code.
Rules Regarding Awards
See the Official Challenge Rules here.




