Multimodal analysis of verbal and nonverbal behaviour on the example of clinical depression
Date
2015
Authors
Alghowinem, Sharifa
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Clinical depression is a common mood disorder that may last for long periods, vary
in severity, and could impair an individual’s ability to cope with daily life. Depression
affects 350 million people worldwide and is therefore considered a burden not
only on a personal and social level, but also on an economic one. Depression is the
fourth most significant cause of suffering and disability worldwide and it is predicted
to be the leading cause in 2020.
Although treatment of depression disorders has proven to be effective in most
cases, misdiagnosing depressed patients is a common barrier. Not only because
depression manifests itself in different ways, but also because clinical interviews and
self-reported history are currently the only ways of diagnosis, which risks a range
of subjective biases either from the patient report or the clinical judgment. While
automatic affective state recognition has become an active research area in the past
decade, methods for mood disorder detection, such as depression, are still in their
infancy. Using the advancements of affective sensing techniques, the long-term goal
is to develop an objective multimodal system that supports clinicians during the
diagnosis and monitoring of clinical depression.
This dissertation aims to investigate the most promising characteristics of depression
that can be “heard” and “seen” by a computer system for the task of detecting
depression objectively. Using audio-video recordings of a clinically validated
Australian depression dataset, several experiments are conducted to characterise
depression-related patterns from verbal and nonverbal cues. Of particular interest in
this dissertation is the exploration of speech style, speech prosody, eye activity, and
head pose modalities. Statistical analysis and automatic classification of extracted
cues are investigated. In addition, multimodal fusion methods of these modalities
are examined to increase the accuracy and confidence level of detecting depression.
These investigations result in a proposed system that detects depression in a binary
manner (e.g. depressed vs. non-depressed) using temporal depression behavioural
cues.
The proposed system: (1) uses audio-video recordings to investigate verbal and
nonverbal modalities, (2) extracts functional features from verbal and nonverbal
modalities over the entire subjects’ segments, (3) pre- and post-normalises the extracted
features, (4) selects features using the T-test, (5) classifies depression in a
binary manner (i.e. severely depressed vs. healthy controls), and finally (6) fuses the
individual modalities.
The proposed system was validated for scalability and usability using generalisation
experiments. Close studies were made of American and German depression
datasets individually, and then also in combination with the Australian one. Applying
the proposed system to the three datasets showed remarkably high classification results - up to a 95% average recall for the individual sets and 86% for the three
combined. Strong implications are that the proposed system has the ability to generalise
to different datasets recorded under quite different conditions such as collection
procedure and task, depression diagnosis testing and scale, as well as cultural and
language background. High performance was found consistently in speech prosody
and eye activity in both individual and combined datasets, with head pose features
a little less remarkable. Strong indications are that the extracted features are robust
to large variations in recording conditions. Furthermore, once the modalities were
combined, the classification results improved substantially. Therefore, the modalities
are shown both to correlate and complement each other, working in tandem as an
innovative system for diagnoses of depression across large variations of population
and procedure.
Description
Keywords
depression analysis, mood detection, computer vision application, speech analysis
Citation
Collections
Source
Type
Thesis (PhD)
Book Title
Entity type
Access Statement
License Rights
Restricted until
Downloads
File
Description