Skip to main content
Waymo Open Dataset
Challenge

Occupancy and Flow Prediction

Overview

Figure 1. Left: Sample occupancy predictions. Red pixels show occupancy predictions for vehicles. The black box shows ground-truth future occupancy of the ego vehicle. Right: Sample flow predictions. Gray boxes show agent occupancies over the input interval (history + current). Colored pixels show predicted flow direction (hue) and magnitude (saturation) at each location in the grid.

Note: The tutorial notebook demonstrates all the items discussed on this web page, including construction of ground-truth, training a baseline model with sample losses, computing the metrics, and generating submission files over the test dataset.

Note: The output representation for the occupancy flow tasks is very similar to other well-studied tasks like semantic segmentation and video prediction. So, it is possible to adopt models designed for these tasks to solve the occupancy flow challenge.

This challenge consists of three connected tasks, defined for the Waymo Open Motion Dataset. All tasks use the same inputs: A one-second history of a number of agents in a scene. The objective for the tasks is to predict future occupancies and flow (motion) of only vehicles over 8 seconds into the future. All predictions are dense grids in bird's-eye view (BEV). Note that for this challenge the predictions should include all vehicles, ignoring the value of objects_of_interest and tracks_to_predict in the dataset protos. Our sample code demonstrates how to construct the ground truth for all tasks.

The WOD motion dataset contains agent positions (and other attributes) at 10 Hz. Each example in the training and validation sets contains 1 second of history data, 1 timestep for the current time, and 8 seconds of future data. This corresponds to 10 history timesteps, 1 current timestep, and 80 future timesteps, for a total of 91 timesteps per scene. The 91 timesteps in each example can be organized as:

{t - 10, t - 9, …, t - 1, t, t + 1, …, t + 79, t + 80}

where t indicates the current timestep. Note that agents are not necessarily present in all timesteps. For example, an agent may get occluded in the future, or it may be currently occluded and only appear in future data. Presence and absence of agents at particular timesteps is indicated by the valid data attributes.

The test set hides the ground truth future data. Each example in the test set contains a total of 11 timesteps (10 history and 1 current timesteps).

Waypoints

When constructing the ground-truth for the occupancy flow challenge, we divide the 80 future timesteps into 8 one-second intervals, each containing 10 timesteps. For example, the first interval consists of the first 10 future timesteps {t + 1, t + 2, …, t + 10} and the last waypoint consists of the last 10 future timesteps {t + 71, t + 72, …, t + 80} in every scenario.

For all tasks in this challenge the objective is to predict 8 disjoint BEV grids corresponding to the 8 intervals. The prediction targets are taken from the last timestep in each interval, that is, timesteps {t + 10, t + 20, …, t + 80}. We refer to each of these 8 timesteps as a waypoint. In this document and in the codebase, we identify these waypoints using the index variable \(k \in \{0, \dots, 7\}\). Ground-truth flow fields are constructed from the displacements between the waypoints, that is {t ⇒ t + 10, t + 10 ⇒ t + 20, …, t + 70 ⇒ t + 80}.

The released code also supports constructing cumulative waypoints, which aggregate occupancy and flow using all the 10 timesteps in each one-second interval. This setting might be preferred in some planning applications, but for this challenge the submissions should predict the individual sub-sampled waypoints (cumulative_waypoints = false).

The released code contains the complete implementation for constructing the ground-truth for all 3 tasks using TensorFlow operations. If you prefer to use other frameworks, we suggest running the code to precompute and store ground-truth data as numpy arrays.

The following sections describe how agents are filtered based on current visibility and agent type (vehicle/pedestrian/cyclist) to get included and rendered in each ground-truth waypoint grid.

Task 1

Future Occupancy of Currently-Observed Vehicles

Given histories of all agents over input timesteps {t - 10, t - 9, ..., t}, predict future occupancy of all vehicles that are present at the current timestep t, for 8 seconds into the future. More specifically, the predictions are 8 occupancy grids \(O^b_k, k \in \{0, \dots, 7\}\), capturing future occupancy of all currently-visible vehicles at 8 different waypoints. Each occupancy grid \(O^b_k\) is a 256 x 256 x 1 array containing values in range [0, 1] indicating the probability that some part of some currently-observed vehicle will occupy that grid cell.

Our sample code demonstrates how the ground truth occupancy grids are constructed for each waypoint. In the ground-truth occupancy grids, each grid cell is set to either 0 or 1.

Task 2

Future Occupancy of Currently-Occluded Vehicles

Figure 2. Future occupancy grids for currently-observed agents (left) and currently-occluded agents (right). Red and green pixels show ground-truth future occupancies for vehicles and pedestrians. The black box shows ground-truth future occupancy of the ego vehicle.

Given histories of all agents over input timesteps {t - 10, t - 9, ..., t}, predict future occupancy of all vehicles that are not present at the current timestep t, for 8 seconds into the future. More specifically, the predictions are 8 occupancy grids \(O^b_k, k \in \{0, \dots, 7\}\), capturing future occupancy of all currently-occluded vehicles at 8 different waypoints. Each occupancy grid \(O^b_k\) is a 256 x 256 x 1 array containing values in range [0, 1] indicating the probability that some part of some currently-occluded vehicle will occupy that grid cell.

Note: an occluded vehicle appears in the ground-truth occupancy grids only at timesteps when it has become disoccluded and observed by the ego vehicle (and not earlier).

Task 3

Future Flow of All Vehicles

Given histories of all agents over input timesteps {t - 10, t - 9, ..., t}, predict future flow of all vehicles (currently observed or occluded), for 8 seconds into the future. More specifically, the predictions are 8 flow fields \(F_k, k \in \{0, \dots, 7\}\), capturing future flow of all vehicles at 8 different waypoints. Each flow field \(F_k\) is a 256 x 256 x 2 array containing (dx, dy) values indicating the displacement over 1 second of the vehicle part that occupies that grid cell.

In ground truth, flow is constructed between timesteps which are 1 second (10 steps) apart. For example, flow for the last timestep t + 80 is constructed by comparing timestep t + 80 with timestep t + 70. Every occupied grid cell at timestep t + 80 stores a (dx, dy) vector pointing to its earlier location at timestep t + 70. Note that the flow vectors point back in time. The ground-truth flow vectors are set to (0, 0) for any grid cells that are not occupied at timestep t + 80. Note that in effect \(F_k\) captures displacements between two successive occupancy waypoints \(O_{k-1}\) and \(O_k\).

Flow metrics are measured only over the occupied grid cells. In other words, grid cells whose ground-truth flow is (0, 0) are excluded from the metrics. Therefore, you are free to predict any value for the unoccupied cells. We recommend excluding the unoccupied cells from your flow loss function as well. These settings decouple the flow prediction and occupancy prediction tasks, and encourage the model to capture the general motion patterns of the scene in the predicted flow fields. Requiring the model to predict (0, 0) flow vectors for the unoccupied cells may lead to conservative predictions where the model predicts smaller flow vectors as an average between a true motion vector and a (0, 0) vector in areas which are not certain to be occupied.

Metrics

Occupancy metrics

The occupancy metrics are used for tasks 1 and 2. They compare ground-truth and predicted occupancy grids for each of the 8 waypoints.

Let \(O^b_k\), \(\hat{O}^b_k\) denote the ground-truth and predicted future occupancy at waypoint \(k\) of currently-observed vehicles. Let \(O^c_k\), \(\hat{O}^c_k\) denote the ground-truth and predicted future occupancy at waypoint \(k\) of currently-occluded vehicles.

Treating the occupancy of each grid cell as a separate binary prediction, the \(\text{AUC}\) metric uses a linearly-spaced set of thresholds in [0, 1] to compute pairs of precision and recall values and estimate the area under the PR-curve. More specifically, we compute \(\text{AUC}(O^b_k, \hat{O}^b_k)\) for currently-observed vehicles and \(\text{AUC}(O^c_k, \hat{O}^c_k)\) for currently-occluded vehicles.

The \(\text{Soft-IoU}\) metric measures the soft intersection-over-union between ground-truth and predicted occupancy grids as:\begin{equation} \text{Soft-IoU}(O^b_k, \hat{O}^b_k) = \frac{\sum_{x,y} O^b_k \hat{O}^b_k}{\sum_{x,y} O^b_k + \hat{O}^b_k - O^b_k \hat{O}^b_k} \end{equation}If \(O^b_k\) is empty, the \(\text{Soft-IoU}\) metric is set to zero. Similarly, we compute \(\text{Soft-IoU}(O^c_k, \hat{O}^c_k)\) for currently-occluded vehicles.All metrics are averaged over the 8 predicted waypoints.

Flow Metric

The End-Point Error (\(\text{EPE}\)) metric measures the mean L2 distance between the ground-truth flow field \(F_k\) and predicted flow field \(\hat{F}_k\) as:

\begin{equation} \Big\lVert F_k(x, y) - \hat{F}_k(x, y)\Big\rVert_2 \text{ where } F_k(x, y) \neq (0, 0). \end{equation}

Joint Occupancy and Flow Metrics

The joint metrics measure the joint accuracy of occupancy and flow predictions at each waypoint \(k\). Given three predictions \(\hat{O}^b_k\), \(\hat{O}^c_k\), \(\hat{F}_k\) for waypoint \(k\), we compute the Flow-Grounded Occupancy metrics as follows:

First, we compute the ground-truth occupancy of all vehicles (currently observed or occluded) at waypoint \(k\) as:

\begin{equation} O_k = O^b_k + O^c_k \end{equation}

and at waypoint \(k-1\) as

\begin{equation} O_{k-1} = O^b_{k-1} + O^c_{k-1}. \end{equation}

We also compute the predicted occupancy of all vehicles as

\begin{equation} \hat{O}_k = \hat{O}^b_k + \hat{O}^c_k. \end{equation}

If the predicted occupancies are accurate, we should have \(\hat{O}_k = O_k\). The occupancy metrics defined above already evaluate this expectation. To ensure correctness of the predicted flow field, \(\hat{F}_k\), we use it to warp the ground-truth origin occupancy of that flow field (\(O_{k-1}\)) as

\begin{equation} \hat{W}_k = \hat{F}_k \circ O_{k-1} \end{equation}

where \(\circ\) indicates function application -- applying the flow field as a function to transform the occupancy. If the predicted flow is accurate, it should be able to reach and cover the future occupancy \(O_k\). Note that since we predict backward flow fields, \(\hat{W}_k\) may predict expansion of occupancy in different directions and reach a larger area beyond \(O_k\). Therefore, we multiply \(\hat{W}_k\) element-wise with \(\hat{O}_k\), to get

\begin{equation} \hat{W}_k \hat{O}_k. \end{equation}

If the predicted occupancy and flow at waypoint \(k\) are accurate, this term should be equal to the ground-truth \(O_k\). In other words, for a grid cell to be marked as occupied in \(\hat{W}_k \hat{O}_k\), it should be supported by both occupancy and flow predictions. Therefore, the flow-grounded occupancy metrics compute \(\text{AUC}\) and \(\text{Soft-IoU}\) between \(\hat{W}_k \hat{O}_k\) and ground-truth \(O_k\) as \(\text{AUC}(O_k, \hat{W}_k \hat{O}_k)\) and \(\text{Soft-IoU}(O_k, \hat{W}_k \hat{O}_k)\).

All metrics are averaged over the 8 predicted waypoints.

Leaderboard

Disqualified from the 2022 Waymo Open Dataset Challenge.

Submit

Submissions for this version of the challenge are closed. You can submit to the 2024 version of the Occupancy and Flow Prediction challenge.

Rules Regarding Awards

See the Official Challenge Rules here.