HUM3DIL: Semi-supervised Multi-modal 3D Human Pose Estimation for Autonomous Driving

Authors

Andrei Zanfir
Mihai Zanfir
Alexander Gorban
Jingwei Ji
Yin Zhou
Dragomir Anguelov
Cristian Sminchisescu

Abstract

Autonomous driving is an exciting new industry, posing important research questions. Within the perception module, 3D human pose estimation is an
emerging technology, which can enable the autonomous vehicle to perceive and
understand the subtle and complex behaviors of pedestrians. While hardware systems and sensors have dramatically improved over the decades – with cars potentially boasting complex LiDAR and vision systems and with a growing expansion
of the available body of dedicated datasets for this newly available information –
not much work has been done to harness these novel signals for the core problem
of 3D human pose estimation. Our method, which we coin HUM3DIL (HUMan
3D from Images and LiDAR), efficiently makes use of these complementary signals, in a semi-supervised fashion and outperforms existing methods with a large
margin. It is a fast and compact model for onboard deployment. Specifically,
we embed LiDAR points into pixel-aligned multi-modal features, which we pass
through a sequence of Transformer refinement stages. Quantitative experiments
on the Waymo Open Dataset support these claims, where we achieve state-of-theart results on the task of 3D pose estimation.