Recovering missing information when projecting 3D points or unprojecting image pixels

Date

2024

Authors

Chen, Wayne

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Computer vision aims to bridge the divide between 2D and 3D spaces. With the significant advance- ments in computational resources and deep learning techniques, neural networks have become the cornerstone for solving computer vision tasks. As the training of a neural network is data-driven, both input and ground truth data play pivotal roles in the network training process. However, 2D data is usually dense but involves a projection operation that loses 3D information; 3D data is often sparse due to sensor limitations. Addressing this challenge, our research focuses on the recovery of missing information when pro- jecting 3D points or unprojecting image pixels, exploring this problem across three different tasks: novel view synthesis, uncertainty-aware monocular depth estimation, and latent space analyses for the deepSDF model. Novel view synthesis from sparse coloured point clouds aims to generate dense RGB images given a sparse XYZRGB input Uncertainty-aware Monocular Depth Estimation (MDE) targets the generation of dense depth es- timation given a dense RGB input and sparse depth ground truth. We propose a novel network with an encoder-decoder structure and a novel loss function that enables joint training of depth and uncertainty estimation. This model competes closely with state-of-the-art solutions on depth estimation evaluation metrics and outperforms them on uncertainty estimation. The latent space analysis for the deepSDF model explores the connections among latent represent- ations of different 3D models. Our experiments reveal that these latent codes are not independent; latent codes generated from linear interpolation between each pair of latent codes represent the transformation from one model to another. Our findings confirm the existence and impact of sparsity within input data. However, our proposed methods demonstrate not only how to overcome these challenges but also how to evaluate their impact on the accuracy of the generated results. This work contributes significantly to enhancing the accuracy and reliability of models tackling data sparsity in the field of computer vision.

Description

Keywords

Citation

Source

Type

Thesis (MPhil)

Book Title

Entity type

Access Statement

License Rights

Restricted until