Convolutional Neural Network based Age Estimation from Facial Image and Depth Prediction from Single Image

Qiu, Jiayan

Convolutional Neural Network based Age Estimation from Facial Image and Depth Prediction from Single Image

Date

2016

Authors

Qiu, Jiayan

Abstract

Convolutional neural network (CNN), one of the most commonly used deep learning methods, has been applied to various computer vision and pattern recognition tasks, and has achieved state-of-the-art performance. Most recent research work on CNN focuses on the innovations of the structure. This thesis explores both the innovation of structure and final label encoding of CNN. To evaluate the performance of our proposed network structure and label encoding method, two computer vision tasks are conducted, namely age estimation from facial image and depth estimation from a single image. For age estimation from facial image, we propose a novel hierarchical aggregation based deep network to learn aging features from facial images and apply our encoding method to transfer the discrete aging labels into a possibility label, which enables the CNN to conduct a classification task rather than regression task. In contrast to traditional aging features, where identical filter is applied to the en- tire facial image, our deep aging feature can capture both local and global cues in aging. Under our formulation, convolutional neural network (CNN) is employed to extract region specific features at lower layers. Then, low layer features are hierarchically aggregated by using fully connected way to consecutive higher layers. The resultant aging feature is of dimensionality 110, which achieves both good discriminative ability and efficiency. Experimental results of age prediction on the MORPH-II and the FG-NET databases show that the proposed deep aging feature outperforms state-of-the-art aging features by a margin. Depth estimation from a single image is an essential component toward understanding the 3D geometry of a scene. Compared with depth estimation from stereo images, depth map estimation from a single image is an extremely challenging task. This thesis addresses this task by regression with deep features, combined with surface normal constrained depth refinement. The proposed framework consists of two steps. First, we implement a convolutional neural network (CNN) to learn the mapping from multi-scale image patches to depth on the super-pixel level. In this step, we apply the proposed label encoding method to transfer the continuous depth labels to be possibility vectors, which reformulates the regression task to a classification task. Second, we refine predicted depth at the super-pixel level to the pixel level by exploiting surface normal constraints on depth map. Experimental results of depth estimation on the NYU2 dataset show that the proposed method achieves a promising performance and has a better performance compared with methods without the proposed label encoding. The above tasks show the proposed label encoding method has promising performance, which is another direction of CNN structure optimization.