A Likelihood-Ratio Based Forensic Voice Comparison in Standard Thai
Abstract
This research uses a likelihood ratio (LR) framework to assess
the discriminatory power of a range of acoustic parameters
extracted from speech samples produced by male speakers of
Standard Thai. The thesis aims to answer two main questions: 1)
to what extent the tested linguistic-phonetic segments of
Standard Thai perform in forensic voice comparison (FVC); and 2)
how such linguistic-phonetic segments are profitably combined
through logistic regression using the FoCal Toolkit (Brümmer,
2007). The segments focused on in this study are the four
consonants /s, ʨh, n, m/ and the two diphthongs [ɔi, ai].
First of all, using the alveolar fricative /s/, two different
sets of features were compared in terms of their performance in
FVC. The first comprised the spectrum-based distributional
features of four spectral moments, namely mean, variance, skew
and kurtosis; the second consisted of the coefficients of the
Discrete Cosine Transform (DCTs) applied to a spectrum. As DCTs
were found to perform better, they were subsequently used to
model the consonant spectrum of the remaining consonants. The
consonant spectrum was extracted at the center point of the /s,
ʨh, n, m/ consonants with a Hamming window of 31.25 msec.
For the diphthongs [ɔi] - [nɔi L] and [ai] - [mai HL], the
cubic polynomials fitted to the F2 and F1-F3 formants were tested
separately. The quadratic polynomials fitted to the tonal F0
contours of [ɔi] - [nɔi L] and [ai] - [mai HL] were tested as
well. Long-term F0 distribution (LTF0) was also trialed.
The results show the promising discriminatory power of the
Standard Thai acoustic features and segments tested in this
thesis. The main findings are as follows.
1. The fricative /s/ performed better with the DCTs (Cllr = 0.70)
than with the spectral moments (Cllr = 0.92).
2. The nasals /n, m/ (Cllr = 0.47) performed better than the
affricate /tɕh/ (Cllr = 0.54) and the fricative /s/ (Cllr =
0.70) when their DCT coefficients were parameterized.
3. F1-F3 trajectories (Cllr = 0.42 and Cllr = 0.49) outperformed
F2 trajectory (Cllr = 0.69 and Cllr = 0.67) for both diphthongs
[ɔi] and [ai].
4. F1-F3 trajectories of the diphthong [ɔi] (Cllr = 0.42)
outperformed those of [ai] (Cllr = 0.49).
5. Tonal F0 (Cllr = 0.52) outperformed LTF0 (Cllr = 0.74).
6. Overall, better results were obtained when DCTs of /n/ - [na:
HL] and /n/ - [nɔi L] were fused. (Cllr = 0.40 with the largest
consistent-with-fact SSLog10LR = 2.53).
In light of the findings, we can conclude that Standard Thai is
generally amenable to FVC, especially when linguistic-phonetic
segments are being combined; it is recommended that the latter
procedure be followed when dealing with forensically realistic
casework.
Description
Citation
Collections
Source
Type
Book Title
Entity type
Access Statement
License Rights
Restricted until
Downloads
File
Description