High level forensic voice comparison based on fused long-term fundamental frequency and word n-gram features

dc.contributor.authorCarne, Michaelen
dc.contributor.authorIshihara, Shunichien
dc.contributor.authorKinoshita, Yukoen
dc.date.accessioned2025-06-29T20:32:48Z
dc.date.available2025-06-29T20:32:48Z
dc.date.issued2022en
dc.description.abstractFeature robustness is particularly important in forensic applications of speaker recognition, where there are often significant differences in the recording conditions between forensic samples. For this reason, high level features have previously been recommended for use in forensic systems, since they tend to be more robust to the acoustic variability introduced by recording conditions [1]. A drawback of high level features though is their poor performance relative to low-level cepstral features. We suggest, however, it may be possible to improve the performance of high feature systems by combining acoustic and idiolectal information, and this may deliver a better trade-off with respect to robustness, interpretability and discrimination performance. In this paper we evaluate a likelihood ratio-based (LR) forensic voice comparison (FVC) system fusing two high level feature subsystems: word n-grams and long-term fundamental frequency (LTF0). Preliminary experiments demonstrate some promising performance gains. We also examine how the duration of speech data impacts on this proposed system.en
dc.description.sponsorshipWe would like to thank the reviewers for their valuable comments. The first author is supported by an Australian Government Research Training Scholarship and ANU Supplementary Scholarship.en
dc.description.statusPeer-revieweden
dc.format.extent5en
dc.identifier.issn2308-457Xen
dc.identifier.otherORCID:/0000-0001-6633-3316/work/162299446en
dc.identifier.scopus85140059362en
dc.identifier.urihttp://www.scopus.com/inward/record.url?scp=85140059362&partnerID=8YFLogxKen
dc.identifier.urihttps://hdl.handle.net/1885/733765477
dc.language.isoenen
dc.relation.ispartofseries23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022en
dc.rightsPublisher Copyright: Copyright © 2022 ISCA.en
dc.sourceProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECHen
dc.subjectforensic phoneticsen
dc.subjectforensic voice comparisonen
dc.subjecthigh level featuresen
dc.subjectlikelihood ratiosen
dc.subjectn-gramsen
dc.titleHigh level forensic voice comparison based on fused long-term fundamental frequency and word n-gram featuresen
dc.typeConference paperen
dspace.entity.typePublicationen
local.bibliographicCitation.lastpage5297en
local.bibliographicCitation.startpage5293en
local.contributor.affiliationCarne, Michael; AGRTP Stipend Scholar - CAP, The Australian National Universityen
local.contributor.affiliationIshihara, Shunichi; Sch of Culture History & Lang, School of Culture, History & Language, ANU College of Asia & the Pacific, The Australian National Universityen
local.contributor.affiliationKinoshita, Yuko; Sch of Culture History & Lang, School of Culture, History & Language, ANU College of Asia & the Pacific, The Australian National Universityen
local.identifier.ariespublicationa383154xPUB37144en
local.identifier.citationvolume2022-Septemberen
local.identifier.doi10.21437/Interspeech.2022-11127en
local.identifier.pure3c1711c4-d74a-432e-9d76-e91c9798eccden
local.identifier.urlhttps://www.scopus.com/pages/publications/85140059362en
local.type.statusPublisheden

Downloads