Learning based depth map estimation: considering noise and scene categories (MS)

Show simple item record

dc.contributor.advisor Dr. Arnav Bhavsar
dc.contributor.author Kumari, Seema
dc.date.accessioned 2020-07-09T05:50:25Z
dc.date.available 2020-07-09T05:50:25Z
dc.date.issued 2019-12-06
dc.identifier.uri http://hdl.handle.net/123456789/286
dc.description A dissertation submitted for the award of the degree of Master of Science under the guidance of Dr. Arnav Bhavsar (Faculty, SCEE.) en_US
dc.description.abstract 3D scene analysis can play a crucial role in different 3D vision-related applications, where depth information is pivotal. However, accurate dense depth sensing through active depth sensors (e.g. Laser depth scanner) is costly. An alternative is to employ low-cost depth sensors, which yields noisy depth information. Another common alternative which is still prevalent is that of depth estimation from intensity images using stereo. However, in this case, establishing correspondences among the multiple viewpoints is often not accurate due to various issues such as illumination, occlusion and so on. Thus, in recent years learning-based depth estimation from single intensity images has been explored. However, intensity images can be noisy due to sensor characteristics. On these lines, we propose an approach to estimate depth from a single intensity image using a learning-based strategy. Here, we have developed a novel convolutional neural network (CNN) encoder-decoder architecture, which learns the depth information using example pairs of color images and their corresponding depth maps. The proposed model is based on an integration of residual connections within pooling (down-sampling) and up-sampling layers, and hourglass module which operates on the encoded features, thus processing these at various scales. Furthermore, the model is optimized under the constraints of perceptual loss as well as the mean squared error loss. The perceptual loss considers the high-level features, thus operating at a different scale of abstraction, which is complementary to the mean squared error loss that considers a pixel-to-pixel error. Considering that the training and testing dataset can be noisy, the estimated depth may not be accurate. Although our depth estimation framework can handle low-level noise in the intensity test image, a higher level of noise can distract the estimated depth map. For this scenario, we propose a denoising algorithm for both intensity images and depth maps that can address higher levels of noise. It has been shown that for denoising, non-local similar patches play an important role. Nevertheless, noise may create ambiguity in finding similar patches, hence it may degrade the results. However, most of the non-local similarity-based approaches do not consider the issue of noisy patch grouping. Hence, we propose to denoise an image by mitigating the issue of grouping non-local similar patches in the presence of noise in the transform domain using sparsity and edge-preserving constraints. The effectiveness of the transform domain grouping of patches is utilized for learning dictionaries and is further extended for achieving an initial approximation of sparse coeffcient vector for the clean image patches. The results are further improved by employing edge-preserving constraints and processing at coarser scales. Our technique is useful to preserve the surface discontinuities and prominent details in depth and intensity images while suppressing noise, and we demonstrate clear benefits of denoising. Another aspect that is considered in this work, is whether an apriori knowledge of scene type can benefit in depth estimation. We demonstrate the improvement in estimating the depth map by classifying different indoor scenes and building different depth estimation models for scene types. Such an approach may be useful in an application involving a small and fixed number of scenes. In order to build a classifier, we have used a smaller version of Residual Convolutional Neural Network (ResNet-18) that discriminates between different indoor scenes (e.g. bookstore, dining, bathroom, classroom, and kitchen, etc.) even in presence of noise in testing images. Here, our denoising method can help in accurate estimation of the depth map. Such an approach can not only serve as an initial step of depth estimation but it can also be useful in scene classification/retrieval application.
dc.language.iso en_US en_US
dc.publisher IITMandi en_US
dc.subject CNN en_US
dc.subject Edge Preservation en_US
dc.title Learning based depth map estimation: considering noise and scene categories (MS) en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search IIT Mandi Repository


Advanced Search

Browse

My Account