Challenge

Real-time 3D Detection

Overview

Given three lidar range images and the associated camera images, produce a set of 3D upright boxes for the objects in the scene.

Note - while all entries appear on the leaderboard, your model must run faster than 70 ms/frame on a Nvidia Tesla V100 GPU to be considered for this challenge’s awards.

While not eligible for winning an award, the model with lowest latency and APH/L2 > 70 will separately be given the title of ‘Most Efficient Model’ and an opportunity to present at CVPR’2021, Workshop on Autonomous Driving!

Leaderboard

To see results of the 3D Detection Challenge without Latency, select 'Show all' in the 'Latency (s)' column of the leaderboard.

Submit

To submit your entry to the leaderboard, upload your file in the serialized protobuf format specified here. If you are participating in the challenge for awards, or submitting for latency evaluation, please note that you would be required to submit a docker image to a Bucket on Google Cloud Storage or to GoogleContainer/Artifact Registry and indicate resource location using docker_image_source field of the submission proto. The details for submitting a docker image can be found here.

If there are many predictions, you can shard them across multiple files where each file contains a subset of the predictions. Then tar and gzip them into a .tar.gz file before uploading.

To be eligible to participate in the challenge, each individual/all team members must read and agree to be bound by the Official Challenge Rules.

You can only submit against the Test Set 3 times every 30 days. (Submissions that error out do not count against this total.)

You must be signed in to upload submissions. Please sign in to continue.

Metrics

Leaderboard ranking for this challenge is by Mean Average Precision with Heading (APH) / L2 among "ALL_NS" (all Object Types except signs), that is, the mean over the APHs of Vehicles, Cyclists, and Pedestrians. Any sensor from the Waymo Open Dataset can be used. Only submissions that run faster than 70 ms/frame on a Nvidia Tesla V100 GPU will be eligible to win the challenge. And we enforce a causal system, i.e, for a frame at time step t, only sensor data up to time t can be used for its prediction.

Primary metric

Average Precision Weighted by Heading (APH): ∫h(r)dr where h(r) is the PR curve weighted by heading accuracy. (prediction [-π, π]) Compute predicted heading error (between 0 and π), divide by π to give weight from [0, 1].
TP = min(|˜θ − θ|, 2π − |˜θ − θ|)/π, where ˜θ and θ are the predicted heading and the ground truth heading in radians within [−π, π]
FP = 1 per false positive presence of an object
FN = 1 per false negative

Secondary metric

Average Precision (AP): ∫p(r)dr where p(r)is the PR curve

Latency

The latency of your submission is measured on a Nvidia Tesla v100 GPU in milliseconds per frame (ms/frame).

IoU Overlap Threshold

Vehicle 0.7, Pedestrian 0.5, Cyclist 0.5

Sensor Names

L: All lidars
LT: Top lidar
C: All cameras
CLT: Camera and top lidar
CL: Camera and all lidars
I: Invalid

Preprocessing

Labeled boxes without any lidar points are not considered during evaluation.

Label Difficulty Breakdown

Each ground truth label is categorized into different difficulty levels (two levels for now):

LEVEL_1, if number of points > 5 and not marked as LEVEL_2 in the released data.
LEVEL_2, if number of points >= 1 and <= 5, or marked as LEVEL_2 in the released data. When evaluating, LEVEL_2 metrics are computed by considering both LEVEL_1 and LEVEL_2 ground truth.

Metric Breakdown

The following metric breakdowns are supported:

OBJECT_TYPE: Breakdown by object type ("ALL_NS" refers to all objects except signs: Vehicle, Cyclist, and Pedestrian)
RANGE: Breakdown by the distance between object center and vehicle frame origin. [0, 35m), [35m, 50m), [50m, +inf)

Rules Regarding Awards

See the Official Challenge Rules here.