Pedestrian Crossing Action Recognition and Trajectory Prediction with 3D Human Keypoints
Authors
Abstract
Accurate understanding and prediction of human
behaviors are critical prerequisites for autonomous vehicles,
especially in highly dynamic and interactive scenarios such
as intersections in dense urban areas. In this work, we aim
at identifying crossing pedestrians and predicting their future
trajectories. To achieve these goals, we not only need the context
information of road geometry and other traffic participants but
also need fine-grained information of the human pose, motion
and activity, which can be inferred from human keypoints. In
this paper, we propose a novel multi-task learning framework
for pedestrian crossing action recognition and trajectory prediction, which utilizes 3D human keypoints extracted from raw
sensor data to capture rich information on human pose and
activity. Moreover, we propose to apply two auxiliary tasks
and contrastive learning to enable auxiliary supervisions to
improve the learned keypoints representation, which further
enhances the performance of major tasks. We validate our
approach on a large-scale in-house dataset, as well as a public
benchmark dataset, and show that our approach achieves stateof-the-art performance on a wide range of evaluation metrics.
The effectiveness of each model component is validated in a
detailed ablation study.