The influence of background data size on the performance of a score-based  likelihood ratio system: A case of forensic text comparison

Ishihara, Shunichi

The influence of background data size on the performance of a score-based likelihood ratio system: A case of forensic text comparison

Date

2020

Authors

Ishihara, Shunichi

Publisher

Australasian Language Technology Association

Abstract

This study investigates the robustness and stability of a likelihood ratio–based (LR-based) forensic text comparison (FTC) system against the size of background population data. Focus is centred on a score-based approach for estimating authorship LRs. Each document is represented with a bag-of-words model, and the Cosine distance is used as the score-generating function. A set of population data that differed in the number of scores was synthesised 20 times using the Monte-Carol simulation technique. The FTC system’s performance with different population sizes was evaluated by a gradient metric of the log–LR cost (Cllr). The experimental results revealed two outcomes: 1) that the score-based approach is rather robust against a small population size—in that, with the scores obtained from the 40 60 authors in the database, the stability and the performance of the system become fairly comparable to the system with a maximum number of authors (720); and 2) that poor performance in terms of Cllr, which occurred because of limited background population data, is largely due to poor calibration. The results also indicated that the score-based approach is more robust against data scarcity than the feature-based approach; however, this finding obliges further study.

URI

http://hdl.handle.net/1885/317053

Collections

ANU Research Publications

Source

The influence of background data size on the performance of a score-based likelihood ratio system: A case of forensic text comparison

Type

Conference paper

Access Statement

Open Access

License Rights

Creative Commons Attribution 4.0 International License

Downloads

File

Description

2020.alta-1.3.pdf (1.07 MB)

Full item page

Cultural advice

The influence of background data size on the performance of a score-based likelihood ratio system: A case of forensic text comparison

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Source

Type

Book Title

Entity type

Access Statement

License Rights

DOI

Restricted until

Downloads