The influence of background data size on the performance of a score-based likelihood ratio system: A case of forensic text comparison
Date
2020
Authors
Ishihara, Shunichi
Journal Title
Journal ISSN
Volume Title
Publisher
Australasian Language Technology Association
Abstract
This study investigates the robustness and stability of a likelihood ratio–based (LR-based) forensic text comparison (FTC) system against the size of background population data. Focus is centred on a score-based approach for estimating authorship LRs. Each document is represented with a bag-of-words model, and the Cosine distance is used as the score-generating function. A set of population data that differed in the number of scores was synthesised 20 times using the Monte-Carol simulation technique. The FTC system’s performance with different population sizes was evaluated by a gradient metric of the log–LR cost (Cllr). The experimental results revealed two outcomes: 1) that the score-based approach is rather robust against a small population size—in that, with the scores obtained from the 40 60 authors in the database, the stability and the performance of the system become fairly comparable to the system with a maximum number of authors (720); and 2) that poor performance in terms of Cllr, which occurred because of limited background population data, is largely due to poor calibration. The results also indicated that the score-based approach is more robust against data scarcity than the feature-based approach; however, this finding obliges further study.
Description
Keywords
Citation
Collections
Source
The influence of background data size on the performance of a score-based likelihood ratio system: A case of forensic text comparison
Type
Conference paper
Book Title
Entity type
Access Statement
Open Access
License Rights
Creative Commons Attribution 4.0 International License
DOI
Restricted until
Downloads
File
Description