Face Recovery from Stylized Portraits using Deep Learning

Shiri, Fatemeh

Face Recovery from Stylized Portraits using Deep Learning

Date

2021

Authors

Shiri, Fatemeh

Abstract

Arguably, face poses form the most telling cues for nonverbal communication. Considering even straightforward and trivial interactions, humans naturally gravitate towards the face, often seeking vital information that is revealed by observing facial expressions. The visual information obtained from faces can be very valuable and interesting, for instance, many applications including the content enhancement and forensics require significant magnification of photorealistic face images. Computer analysis and photorealistic multimedia content editing are examples of applications that use face images. However, face perception at a reasonable level of accuracy is possible only if sufficient details exist in these images. Artistic portraits present further challenges as there is little scope to infer anything about underlying subjects. This thesis addresses this deficiency by presenting approaches for solving the challenge of reconstructing photorealistic images from their artistic counterparts.Specifically, the problem we consider is to recover faces from artistic portraits(also known as face destylization) including the recovery of fine detail and facial features from deteriorated portraits. To tackle this problem, we consider an approach based on Deep Neural Networks (DNNs). Through three successive studies we demonstrate that face recovery is achievable, and moreover, the faces can be re-covered with high levels of accuracy. The method we develop through the course of these studies proves very powerful, and we further demonstrate this by applying itto the problem of generating high-resolution face images from very low resolution inputs. The main contribution in this thesis is the development of a generative-discriminative DNN, which has previously shown to efficiently generate realistic images. By successively improving the network we show that significantly more accurate faces can be recovered from their respective portraits. In particular, we make cumulative improvements to the DNN in three different stages corresponding to three different studies, with each study demonstrating substantial gains. To further demonstrate the efficacy of our approach, we develop a DNN specifically for the task of face frontalization and hallucination. We show that our network can generate high-quality super-resolved and frontalized face images which are visually very close to their corresponding ground-truth counterparts, thus achieving superior face hallucination performance. In summary, this thesis presents approaches based on DNNs to recover realistic faces from portraits. Our first face destylization architecture uses a pixel-wise loss in the generative part of the network. Despite it being effective at recovering faces from portraits, it is unable to do the same when portraits are misaligned with a variety of rotations and viewpoint variations. To handle this, we extended our approach in two aspects: (a) by using STNs as intermediate layers to compensate for misalignments of input portraits, and (b) incorporate an identity-preserving loss to the generative part of our network to recover the underlying identity accurately. As a third extension, in order to recover the high-frequency facial details, we incorporate auxiliary facial attributes into the extracted feature maps. In this fashion, we fuse visual and semantic information for best visual results. This also allows us to manipulate appearance details such as hair color, facial expressions, etc. We also introduce theTANN to upsample and frontalize very low resolution unaligned face images jointly in an end-to-end fashion. We have conducted an extensive experimental analysis, for each extension of the proposed DNNs, that demonstrates the superiority of our proposed methods over the current state of the art.