黄正能教授: A Radar Object Detection Network (RODNet) Cross-Supervised by Camera-Radar Fused Object 3D Localization

12月14日 9:30,线上

发布者:韦钰 发布时间:2020-12-08浏览次数:5837

报告内容:A Radar Object Detection Network (RODNet) Cross-Supervised by Camera-Radar Fused Object 3D Localization

报告人:黄正能 教授

报告时间:12月14日 9:30

报告方式:线上(腾讯会议:543 689 048)


报告人简介

Various autonomous or assisted driving strategies have been facilitated through accurate and reliable perception of the environment around a vehicle. Among the sensors that are commonly used, radar has usually been considered as a robust and cost-effective solution even in severe driving scenarios, e.g., weak/strong lighting and bad weather. Instead of considering to fuse the unreliable information from all available sensors, perception from pure radar data becomes a valuable alternative that worth exploring. However, unlike rich RGB-based images captured by a camera, it is noticeably difficult to extract semantic information from the radar signals. In this paper, we propose a deep radar object detection network, named RODNet, which is cross-supervised, to effectively detect objects from the radar frequency (RF) images in real-time without the presence of other sensors. First, the raw data captured by millimeter-wave radars are transformed to RF images in range-azimuth coordinates. Second, our proposed RODNet takes a sequence of RF images as the input to predict a likelihood of objects in the radar field of view (FoV). Two customized modules are also added to handle multi-chirp information and radar object ego-motions. Instead of using human-labeled ground truth for training, the network is cross-supervised by a novel 3D localization of detected objects using a camera-radar fusion (CRF) strategy in the training stage. Finally, we propose a method to evaluate the object detection performance of the RODNet. Due to no existing public dataset available for our task, we create a new dataset, CameraRadar of the University of Washington (CRUW), which contains synchronized RGB and RF image sequences in various driving scenarios.



  报告内容简介:

 Multiple object tracking (MOT) and video object segmentation (VOS)  are crucial tasks in computer vision society. Further improvement and significance  can be achieved by effectively combining these two tasks together, i.e.,  multiple object tracking and segmentation (MOTS). However, most tracking-by-detection MOT methods, with available detected bounding boxes, cannot effectively handle static, slow-moving and fast moving camera scenarios simultaneously due to ego-motion and frequent occlusion. In this work, we propose a novel tracking framework, called “instance-aware MOT” (IAMOT), that can track multiple objects in either static or moving cameras by jointly considering the instance-level features and object motions. Overall, when evaluated on the MOTS20 and KITTI-MOTS dataset, our proposed method won the first place in Track3 of the BMTT Challenge in IEEE CVPR 2020 workshop. When Lidar information is available, we further propose a multi-stage framework called “Lidar and monocular Image Fusion based multi-object Tracking and Segmentation (LIFTS)” for MOTS. This proposed framework is also evaluated on BMTT Challenge 2020 Track2: KITTI-MOTS dataset and achieves the 2nd place ranking in the competition.