Lexical features in forensic speaker comparison
Abstract
In forensic speaker comparison (FSC), evidence is usually quantified by a set of measurements from continuous acoustic features. These are typically phonetically based (e.g. vowel formant frequencies) or features derived from automatic techniques (e.g. Mel-frequency Cepstral Coefficients). Some speaker specific characteristics, however, are defined by their presence/absence in speech, or by frequency of occurrence. These characteristics tend to be reflected in longer term patterns of speech and include things like a speaker's habitual lexical and syntactic choices, discourse patterns, conversational style and patterns of speech disfluency. Quantification of these feature types involves converting speech transcription data into a sequences of tokens, the pattern of occurrence of which can be used to characterise speaker differences. High level linguistic features such as these have been successfully employed in Automatic Speaker Recognition (ASR) systems. However, little attention has been given to evaluation of these feature types for forensic applications within likelihood ratio-based FSC (LR-based FSC). This thesis makes both empirical and methodological contributions. Empirically, it evaluates the strength of evidence obtainable from lexical features and their robustness to forensic recording conditions. Methodologically, it evaluates the relative merits of score- vs. feature-based procedures for LR calculation from categorical data and how feature selection can optimise performance and address the issue of dimensionality inherent to textual data. Using speech transcription data from Australian English speakers, this thesis demonstrates that effective speaker discrimination can be performed based on word n-gram models. Evaluation of five filter-based feature selection procedures demonstrates that ANOVA F-ratio yields the largest performance gains. However, it is also found that procedures based on statistical significance (like ANOVA F-ratio) favour rare terms, which tend to be topic rather than speaker dependent. It light of this, feature pruning to remove rare terms, or an explicitly defined feature set based on function words, is recommended. The latter, while demonstrating weaker performance relative to implicitly selected word n-grams, arguably circumvents the issue of topic bias in the speaker comparison. Function words are also found to be reasonably robust to reductions in the amount of speech available, however appear not to be robust to mismatches in speaking style. Finally, a feature-based procedure (two-level multinomial LR) for LR computation is shown to outperform a score-based procedure (cosine distance). The feature-based procedure is furthermore argued to be preferable since it incorporates measures of both the similarity and typically of the speech samples under comparison.
Description
Keywords
Citation
Collections
Source
Type
Book Title
Entity type
Access Statement
License Rights
Restricted until
Downloads
File
Description