We plan to grow the size of this dataset soon.

We will also be publishing performance of some baseline models on this dataset. After that, we plan to host challenges for those working with this dataset. We are not accepting submissions right now, but stay tuned for more details.

Be sure to sign up if you would like to be notified about these updates. If you have any questions, contact us at open-dataset@waymo.com.

Given a single lidar spin and the associated camera images, produce a set of 3D upright boxes or 2D top down boxes for the objects in the scene. This is a simplified version of the 3D tracking challenge (mentioned below), which ignores the temporal component. Advances here usually translate to tracking, but are simpler to try.

Given a temporal sequence (up to 10) of lidar-camera data, produce a set of 3D upright boxes or 2D top down boxes for the objects in each frame, their velocities (optional) and the correspondence between boxes across frames.

This section gives an overview of metrics for the detection and tracking problems. Additional details for evaluation metrics will be shared soon.

Preliminary metrics for tracking are mentioned below. These will get refined further and discussed in more detail in an upcoming paper.

Mean average precision (mAP) | ∫p(r)dr where p(r)is the PR curve |

Mean average precision weighted by heading (mAPH) |
∫h(r)dr where h(r) is the PR curve weighted by heading accuracy.
(prediction [-π, π]) Choose angle correction (between 0 and π), divide by π to give weight from [0, 1]. TP = 1 - abs(predicted heading - groundtruth heading) / π per successful matching FP = 1 per false positive presence of an object FN = 1 per false negative |

Vehicle 0.7, Pedestrian 0.5, Cyclist 0.5, Sign 0.5

Labeled boxes without any lidar points are not considered during evaluation.

Each ground truth label is categorized into different difficulty levels (two levels for now):

- LEVEL_1, if number of points >= 5
- LEVEL_2, if number of points >= 1 and < 5, or marked as LEVEL_2 in the released data

The following metric breakdowns are supported:

- OBJECT_TYPE: Breakdown by object type
- RANGE: Breakdown by the distance between object center and vehicle frame origin. [0, 35m), [35m, 50m), [50m, +inf)

Preliminary metrics for tracking are mentioned below. These will get refined further and discussed in more detail in an upcoming paper.

MOTA: multiple object tracking accuracy | Accounts for all object configuration errors made by the tracker, false positives, misses, mismatches, over all frames. See this paper for details. |

MOTP: multiple object tracking precision | The total error in estimated position for matched object-hypothesis pairs over all frames, averaged by the total number of matches made. It shows the ability of the tracker to estimate precise object positions, independent of its skill at recognizing object configurations, keeping consistent trajectories. See this paper for details. |

Labeled box without any lidar points in all 10 frames is not considered during evaluation.

Each ground truth label is categorized into different difficulty levels (two levels for now).

- LEVEL_1, if number of points in 10 frames >= 20
- LEVEL_2, if number of points >= 1 and < 20, or marked as LEVEL_2 in the released data

The following metric breakdowns are supported:

- OBJECT_TYPE: Breakdown by object type
- RANGE: Breakdown by the distance between object center and vehicle frame origin. [0, 35m), [35m, 50m), [50m, +inf)