Download PDFOpen PDF in browserVisual Odometry Based on Convolutional Neural Networks for Large-Scale ScenesEasyChair Preprint 41310 pages•Date: August 9, 2018AbstractThe main task of visual odometry (VO) is to measure camera motion and image depth, which is the basis of 3D reconstruction and the front-end of simultaneous localization and mapping (SLAM). However, most of the existing methods have low accuracy or require advanced sensors. In order to predict camera pose and image depth at the same time with high accuracy from image sequences captured by regular camera, we train a novel framework, named PD-Net, and it is based on a convolutional neural network (CNN). There are two main modules: one is pose estimator which is able to estimate the 6-DoF camera pose, the other is depth estimator computing the depth of its view. The keys of our proposed framework are that PD-Net comprises some shared convolutional layers and is divided into two branches to estimate camera motion and image depth, respectively. Experiments on KITTI and TUM datasets show that our proposed method can extract meaningful depth estimation and successfully estimate frame-to-frame camera rotations and translations in large scenes even texture-less. It outperforms previous methods in terms of accuracy and robustness. Keyphrases: 3D reconstruction, CNN, Depth Prediction, SLAM, Visual Odometry, pose estimation
|