Skip to main content
Waymo Open Dataset

3D Tracking


Given a temporal sequence (prior frames) of lidar and camera data, produce a set of 3D upright boxes and the correspondences between boxes across frames. You may use the baseline model or object detections linked below, if you wish.




To submit your entry to the leaderboard, upload your file in the format specified in the Submission protos. This challenge does not have any awards. You can only submit against the Test Set 3 times every 30 days. (Submissions that error out do not count against this total.)

Sign in to submit

You must be signed in to upload submissions. Please sign in to continue.

Sign in


Leaderboard ranking for this challenge is by MOTA/L2 for Object Type "ALL_NS" (all objects except signs: Vehicle, Cyclist, and Pedestrian) with any sensors and any number of prior frames (no future frames).

Primary metric

Multiple Object Tracking Accuracy (MOTA): Accounts for all object configuration errors made by the tracker, false positives, misses, mismatches, over all frames. See this paper for details.

Secondary metric

Multiple Object Tracking Precision (MOTP): The total error in estimated position for matched object-hypothesis pairs over all frames, averaged by the total number of matches made. It shows the ability of the tracker to estimate precise object positions, independent of its skill at recognizing object configurations, keeping consistent trajectories. See this paper for details.False Positive ratio: The number of false positives (all tracker hypotheses for which no real object exists) divided by the total number of objects. See this paper for details.Miss ratio: The number of misses (all objects for which no hypothesis was output) divided by the total number of objects. See this paper for details.Mismatch ratio: The number of mismatches (all occurences where the tracking hypothesis for an object changed compared to previous frames) divided by the total number of objects. See this paper for details.

Sensor Names

L: All lidars
LT: Top lidar
C: All cameras
CLT: Camera and top lidar
CL: Camera and all lidars
I: Invalid


Labeled boxes without any lidar points in the frames are not considered during evaluation.

Label Difficulty Breakdown

Each ground truth label is categorized into different difficulty levels (two levels for now):

  • LEVEL_1, if number of points > 5 and not marked as LEVEL_2 in the released data.

  • LEVEL_2, if number of points >= 1 and <= 5, or marked as LEVEL_2 in the released data. When evaluating, LEVEL_2 metrics are computed by considering both LEVEL_1 and LEVEL_2 ground truth.

Metric Breakdown

The following metric breakdowns are supported:

  • OBJECT_TYPE: Breakdown by object type ("ALL_NS" refers to all objects except signs: Vehicle, Cyclist, and Pedestrian)

  • RANGE: Breakdown by the distance between object center and vehicle frame origin. [0, 35m), [35m, 50m), [50m, +inf)

Baseline Model and Object Detections

If you want to focus only on the tracking part of this challenge, we are providing baseline object detections that you are free to use. These were generated from PointPillars + PPBA and the format is the same as the Submission format. You can find the baseline lingvo model on GitHub.