Deep learning model for simultaneous recognition of quantitative and qualitative emotion using visual and bio-sensing data

dc.contributor.authorHosseini, Imanen
dc.contributor.authorHossain, Md Zakiren
dc.contributor.authorZhang, Yuhaoen
dc.contributor.authorRahman, Shafinen
dc.date.accessioned2025-05-23T12:25:04Z
dc.date.available2025-05-23T12:25:04Z
dc.date.issued2024en
dc.description.abstractThe recognition of emotions heavily relies on important factors such as human facial expressions and physiological signals, including electroencephalogram and electrocardiogram. In literature, emotion recognition is investigated quantitatively (while estimating valance, arousal, and dominance) and qualitatively (while predicting discrete emotions like happiness, sadness, anger, surprise, and so on). Current methods utilize a combination of visual data and bio-sensing information to create recognition systems that incorporate multiple modes (quantitative/qualitative). Nevertheless, these methods necessitate extensive expertise in specific domains and intricate preprocessing procedures, and consequently, they are unable to fully leverage the inherent advantages of end-to-end deep learning techniques. Moreover, methods usually aim to recognize either qualitative or quantitative emotions. Although both kinds of emotions are significantly co-related, previous methods do not simultaneously recognize qualitative and quantitative emotions. In this paper, a novel deep end-to-end framework named DeepVADNet is introduced, specifically designed for the purpose of multi-modal emotion recognition. The proposed framework leverages deep learning techniques to effectively extract crucial face appearance features as well as bio-sensing features, predicting both qualitative and quantitative emotions in a single forward pass. In this study, we employ the CRNN architecture to extract face appearance features, while the ConvLSTM model is utilized to extract spatio-temporal information from visual data (videos). Additionally, we utilize the Conv1D model for processing physiological signals (EEG, EOG, ECG, and GSR) as this approach deviates from conventional manual techniques that involve traditional manual methods for extracting features based on time and frequency domains. After enhancing the feature quality by fusing both modalities, we use a novel method employing quantitative emotion to predict qualitative emotions accurately. We perform extensive experiments on the DEAP and MAHNOB-HCI datasets, achieving state-of-the-art quantitative emotion recognition results of 98.93%/6e-4 and 89.08%/0.97 (mean classification accuracy/MSE) in both datasets, respectively. Also, for the qualitative emotion recognition task, we achieve 82.71% mean classification accuracy on the MAHNOB-HCI dataset. The code and evaluation can be accessed at: https://github.com/I-Man-H/DeepVADNet.giten
dc.description.statusPeer-revieweden
dc.identifier.issn1077-3142en
dc.identifier.otherORCID:/0000-0003-1892-831X/work/203091545en
dc.identifier.scopus85202294018en
dc.identifier.urihttp://www.scopus.com/inward/record.url?scp=85202294018&partnerID=8YFLogxKen
dc.identifier.urihttps://hdl.handle.net/1885/733752273
dc.language.isoenen
dc.rightsPublisher Copyright: © 2024 Elsevier Inc.en
dc.sourceComputer Vision and Image Understandingen
dc.subjectDeep learningen
dc.subjectEmotion recognitionen
dc.subjectEnd-to-end learningen
dc.subjectHuman facial expressionsen
dc.subjectPhysiological signalsen
dc.titleDeep learning model for simultaneous recognition of quantitative and qualitative emotion using visual and bio-sensing dataen
dc.typeJournal articleen
dspace.entity.typePublicationen
local.contributor.affiliationHosseini, Iman; Australian National Universityen
local.contributor.affiliationHossain, Md Zakir; Biological Data Science Institute, ANU College of Science and Medicine, The Australian National Universityen
local.contributor.affiliationZhang, Yuhao; School of Computing, ANU College of Systems and Society, The Australian National Universityen
local.contributor.affiliationRahman, Shafin; North South Universityen
local.identifier.citationvolume248en
local.identifier.doi10.1016/j.cviu.2024.104121en
local.identifier.pure0e90e189-a87a-4d67-97ef-13aebd5eb4d0en
local.identifier.urlhttps://www.scopus.com/pages/publications/85202294018en
local.type.statusPublisheden

Downloads