Monocular depth predictors are typically trained on large-scale training sets which are naturally biased w.r.t the distribution of camera poses. As a result, trained predictors fail to make reliable depth predictions for testing examples captured under uncommon camera poses. To address this issue, we propose two novel techniques that exploit the camera pose during training and prediction. First, we introduce a simple perspective-aware data augmentation that synthesizes new training examples with more diverse views by perturbing existing ones in a geometrically consistent manner. Second, we propose a conditional model that exploits the per-image camera pose as prior knowledge by encoding it as a part of the input. We show that jointly applying the two methods improves depth prediction on images captured under uncommon and even never-before-seen camera poses. We show our methods improve performance when applied to a range of different predictor architectures. Lastly, we show that explicitly encoding the camera pose distribution improves the generalization performance of a synthetically trained depth predictor when evaluated on real images.
We propose to exploit camera pose as prior conditioned on which we train the depth predictor. The camera pose can be from sensors or or estimated by camera pose predictors. We encode the camera pose as a 2D map that is concatenated with the image as input to a pose-conditional depth predictor. As seen in the prediction error map, by leveraging the pose prior CPP allows much better depth estimates compared to a vanilla baseline model that takes only RGB image as input.
We propose a factorized approach that disentangles viewpoint statistics from depth predictions. Specifically, we encode camera pose priors (CPP) as scene-independent spatial maps that are later concatenated with RGB images as input to pose-conditional depth predictors For more details, please refer to our paper linked above.
If you find this work useful in your research, please consider citing:
@article{zhao2021camera, title={Camera Pose Matters: Improving Depth Prediction by Mitigating Pose Distribution Bias}, author={Zhao, Yunhan and Kong, Shu and Fowlkes, Charless}, journal={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, year={2021} }
This research was supported by NSF grants IIS-1813785, IIS-1618806, a research gift from Qualcomm, and a hardware donation from NVIDIA.