Deep learning model for simultaneous recognition of quantitative and qualitative emotion using visual and bio-sensing data

Hosseini, Iman; Hossain, Md Zakir; Zhang, Yuhao; Rahman, Shafin

Deep learning model for simultaneous recognition of quantitative and qualitative emotion using visual and bio-sensing data

dc.contributor.author	Hosseini, Iman	en
dc.contributor.author	Hossain, Md Zakir	en
dc.contributor.author	Zhang, Yuhao	en
dc.contributor.author	Rahman, Shafin	en
dc.date.accessioned	2025-05-23T12:25:04Z
dc.date.available	2025-05-23T12:25:04Z
dc.date.issued	2024	en
dc.description.abstract	The recognition of emotions heavily relies on important factors such as human facial expressions and physiological signals, including electroencephalogram and electrocardiogram. In literature, emotion recognition is investigated quantitatively (while estimating valance, arousal, and dominance) and qualitatively (while predicting discrete emotions like happiness, sadness, anger, surprise, and so on). Current methods utilize a combination of visual data and bio-sensing information to create recognition systems that incorporate multiple modes (quantitative/qualitative). Nevertheless, these methods necessitate extensive expertise in specific domains and intricate preprocessing procedures, and consequently, they are unable to fully leverage the inherent advantages of end-to-end deep learning techniques. Moreover, methods usually aim to recognize either qualitative or quantitative emotions. Although both kinds of emotions are significantly co-related, previous methods do not simultaneously recognize qualitative and quantitative emotions. In this paper, a novel deep end-to-end framework named DeepVADNet is introduced, specifically designed for the purpose of multi-modal emotion recognition. The proposed framework leverages deep learning techniques to effectively extract crucial face appearance features as well as bio-sensing features, predicting both qualitative and quantitative emotions in a single forward pass. In this study, we employ the CRNN architecture to extract face appearance features, while the ConvLSTM model is utilized to extract spatio-temporal information from visual data (videos). Additionally, we utilize the Conv1D model for processing physiological signals (EEG, EOG, ECG, and GSR) as this approach deviates from conventional manual techniques that involve traditional manual methods for extracting features based on time and frequency domains. After enhancing the feature quality by fusing both modalities, we use a novel method employing quantitative emotion to predict qualitative emotions accurately. We perform extensive experiments on the DEAP and MAHNOB-HCI datasets, achieving state-of-the-art quantitative emotion recognition results of 98.93%/6e-4 and 89.08%/0.97 (mean classification accuracy/MSE) in both datasets, respectively. Also, for the qualitative emotion recognition task, we achieve 82.71% mean classification accuracy on the MAHNOB-HCI dataset. The code and evaluation can be accessed at: https://github.com/I-Man-H/DeepVADNet.git	en
dc.description.status	Peer-reviewed	en
dc.identifier.issn	1077-3142	en
dc.identifier.other	ORCID:/0000-0003-1892-831X/work/203091545	en
dc.identifier.scopus	85202294018	en
dc.identifier.uri	http://www.scopus.com/inward/record.url?scp=85202294018&partnerID=8YFLogxK	en
dc.identifier.uri	https://hdl.handle.net/1885/733752273
dc.language.iso	en	en
dc.rights	Publisher Copyright: © 2024 Elsevier Inc.	en
dc.source	Computer Vision and Image Understanding	en
dc.subject	Deep learning	en
dc.subject	Emotion recognition	en
dc.subject	End-to-end learning	en
dc.subject	Human facial expressions	en
dc.subject	Physiological signals	en
dc.title	Deep learning model for simultaneous recognition of quantitative and qualitative emotion using visual and bio-sensing data	en
dc.type	Journal article	en
dspace.entity.type	Publication	en
local.contributor.affiliation	Hosseini, Iman; Australian National University	en
local.contributor.affiliation	Hossain, Md Zakir; Biological Data Science Institute, ANU College of Science and Medicine, The Australian National University	en
local.contributor.affiliation	Zhang, Yuhao; School of Computing, ANU College of Systems and Society, The Australian National University	en
local.contributor.affiliation	Rahman, Shafin; North South University	en
local.identifier.citationvolume	248	en
local.identifier.doi	10.1016/j.cviu.2024.104121	en
local.identifier.pure	0e90e189-a87a-4d67-97ef-13aebd5eb4d0	en
local.identifier.url	https://www.scopus.com/pages/publications/85202294018	en
local.type.status	Published	en

Collections

ANU Research Publications

Deep learning model for simultaneous recognition of quantitative and qualitative emotion using visual and bio-sensing data

Downloads

Collections