Learning based depth map estimation : considering noise and scene categories (MS)

Kumari, Seema

Learning based depth map estimation : considering noise and scene categories (MS)

Kumari, Seema

URI: http://hdl.handle.net/123456789/433

Date: 2019-12-06

Abstract:

3D scene analysis can play a crucial role in different 3D vision-related applications, where depth information is pivotal. However, accurate dense depth sensing through active depth sensors (e.g. Laser depth scanner) is costly. An alternative is to employ low-cost depth sensors, which yields noisy depth information. Another common alternative which is still prevalent is that of depth estimation from intensity images using stereo. However, in this case, establishing correspondences among the multiple viewpoints is often not accurate due to various issues such as illumination, occlusion and so on. Thus, in recent years learning based depth estimation from single intensity images has been explored. However, intensity images can be noisy due to sensor characteristics. On these lines, we propose an approach to estimate depth from a single intensity image using a learning-based strategy. Here, we have developed a novel convolutional neural network (CNN) encoder-decoder architecture, which learns the depth information using example pairs of color images and their corresponding depth maps. The proposed model is based on an integration of residual connections within pooling (down-sampling) and up-sampling layers, and hourglass module which operates on the encoded features, thus processing these at various scales. Furthermore, the model is optimized under the constraints of perceptual loss as well as the mean squared error loss. The perceptual loss considers the high-level features, thus operating at a different scale of abstraction, which is complementary to the mean squared error loss that considers a pixel-to-pixel error. Considering that the training and testing dataset can be noisy, the estimated depth may not be accurate. Although our depth estimation framework can handle low-level noise in the intensity test image, a higher level of noise can distract the estimated depth map. For this scenario, we propose a denoising algorithm for both intensity images and depth maps that can address higher levels of noise. It has been shown that for denoising, non-local similar patches play an important role. Nevertheless, noise may create ambiguity in finding similar patches, hence it may degrade the results. However, most of the non-local similarity-based approaches do not consider the issue of noisy patch grouping. Hence, we propose to denoise an image by mitigating the issue of grouping non-local similar patches in the presence of noise in the transform domain using sparsity and edge-preserving constraints. The e ectiveness of the transform domain grouping of patches is utilized for learning dictionaries and is further extended for achieving an initial approximation of sparse coe cient vector for the clean image patches. The results are further improved by employing edge preserving constraints and processing at coarser scales. Our technique is useful to preserve the surface discontinuities and prominent details in depth and intensity images while suppressing noise, and we demonstrate clear benefits of denoising. Another aspect that is considered in this work, is whether an apriori knowledge of scene type can benefit in depth estimation. We demonstrate the improvement in estimating the depth map by classifying di erent indoor scenes and building di erent depth estimation models for scene types. Such an approach may be useful in an application involving a small and fixed number of scenes. In order to build a classifier, we have used a smaller version of Residual Convolutional Neural Network (ResNet-18) that discriminates between di erent indoor scenes (e.g. bookstore, dining, bathroom, classroom, and kitchen, etc.) even in presence of noise in testing images. Here, our denoising method can help in accurate estimation of the depth map. Such an approach can not only serve as an initial step of depth estimation but it can also be useful in scene classification/retrieval application.

Show full item record