Qiu, Jiayan
Description
Convolutional neural network (CNN), one of the most commonly used
deep learning methods, has been applied to various computer
vision and pattern recognition tasks, and has achieved
state-of-the-art performance. Most recent research work on CNN
focuses on the innovations of the structure. This thesis explores
both the innovation of structure and final label encoding of CNN.
To evaluate the performance of our proposed network structure and
label encoding method,...[Show more] two computer vision tasks are conducted,
namely age estimation from facial image and depth estimation from
a single image.
For age estimation from facial image, we propose a novel
hierarchical aggregation based deep network to learn aging
features from facial images and apply our encoding method to
transfer the discrete aging labels into a possibility label,
which enables the CNN to conduct a classification task rather
than regression task. In contrast to traditional aging features,
where identical filter is applied to the en-
tire facial image, our deep aging feature can capture both local
and global cues in aging. Under our formulation, convolutional
neural network (CNN) is employed to extract region specific
features at lower layers. Then, low layer features are
hierarchically aggregated by using fully connected way to
consecutive higher layers. The resultant aging feature is of
dimensionality 110, which achieves both good discriminative
ability and efficiency. Experimental results of age prediction on
the MORPH-II and the FG-NET databases show that the proposed deep
aging feature outperforms state-of-the-art aging features by a
margin.
Depth estimation from a single image is an essential component
toward understanding the 3D geometry of a scene. Compared with
depth estimation from stereo images, depth map estimation from a
single image is an extremely challenging task. This thesis
addresses this task by regression with deep features, combined
with surface normal constrained depth refinement. The proposed
framework consists of two steps. First, we implement a
convolutional neural network (CNN) to learn the mapping from
multi-scale image patches to depth on the super-pixel level. In
this step, we apply the proposed label encoding method to
transfer the continuous depth labels to be possibility vectors,
which reformulates the regression task to a classification task.
Second, we refine predicted depth at the super-pixel level to the
pixel level by exploiting surface normal constraints on depth
map. Experimental results of depth estimation on the NYU2 dataset
show that the proposed method achieves a promising performance
and has a better performance compared
with methods without the proposed label encoding.
The above tasks show the proposed label encoding method has
promising performance, which is another direction of CNN
structure optimization.
Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.