The influence of background data size on the performance of a score-based likelihood ratio system: A case of forensic text comparison
dc.contributor.author | Ishihara, Shunichi | |
dc.coverage.spatial | Virtual | |
dc.date.accessioned | 2024-04-23T23:11:43Z | |
dc.date.available | 2024-04-23T23:11:43Z | |
dc.date.created | 2020 | |
dc.date.issued | 2020 | |
dc.date.updated | 2022-12-25T07:16:58Z | |
dc.description.abstract | This study investigates the robustness and stability of a likelihood ratio–based (LR-based) forensic text comparison (FTC) system against the size of background population data. Focus is centred on a score-based approach for estimating authorship LRs. Each document is represented with a bag-of-words model, and the Cosine distance is used as the score-generating function. A set of population data that differed in the number of scores was synthesised 20 times using the Monte-Carol simulation technique. The FTC system’s performance with different population sizes was evaluated by a gradient metric of the log–LR cost (Cllr). The experimental results revealed two outcomes: 1) that the score-based approach is rather robust against a small population size—in that, with the scores obtained from the 40 60 authors in the database, the stability and the performance of the system become fairly comparable to the system with a maximum number of authors (720); and 2) that poor performance in terms of Cllr, which occurred because of limited background population data, is largely due to poor calibration. The results also indicated that the score-based approach is more robust against data scarcity than the feature-based approach; however, this finding obliges further study. | en_AU |
dc.format.mimetype | application/pdf | en_AU |
dc.identifier.uri | http://hdl.handle.net/1885/317053 | |
dc.language.iso | en_AU | en_AU |
dc.provenance | Materials prior to 2016 here are licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 International License. Permission is granted to make copies for the purposes of teaching and research. Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International License. | en_AU |
dc.publisher | Australasian Language Technology Association | en_AU |
dc.relation.ispartofseries | The Australasian Language Technology Association Workshop 2020 | en_AU |
dc.rights | ACL materials are Copyright © 1963–2024 ACL; other materials are copyrighted by their respective copyright holders. | en_AU |
dc.rights.license | Creative Commons Attribution 4.0 International License | en_AU |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | en_AU |
dc.source | The influence of background data size on the performance of a score-based likelihood ratio system: A case of forensic text comparison | en_AU |
dc.title | The influence of background data size on the performance of a score-based likelihood ratio system: A case of forensic text comparison | en_AU |
dc.type | Conference paper | en_AU |
dcterms.accessRights | Open Access | en_AU |
local.bibliographicCitation.lastpage | 11 | en_AU |
local.bibliographicCitation.startpage | 1 | en_AU |
local.contributor.affiliation | Ishihara, Shunichi, College of Asia and the Pacific, ANU | en_AU |
local.contributor.authoremail | u9504440@anu.edu.au | en_AU |
local.contributor.authoruid | Ishihara, Shunichi, u9504440 | en_AU |
local.description.notes | Imported from ARIES | en_AU |
local.description.refereed | Yes | |
local.identifier.absfor | 460404 - Digital forensics | en_AU |
local.identifier.absfor | 470403 - Computational linguistics | en_AU |
local.identifier.absfor | 460208 - Natural language processing | en_AU |
local.identifier.absseo | 220301 - Digital humanities | en_AU |
local.identifier.absseo | 220402 - Applied computing | en_AU |
local.identifier.absseo | 130202 - Languages and linguistics | en_AU |
local.identifier.ariespublication | u3391657xPUB216 | en_AU |
local.identifier.uidSubmittedBy | u3391657 | en_AU |
local.publisher.url | https://aclanthology.org/ | en_AU |
local.type.status | Published Version | en_AU |
Downloads
Original bundle
1 - 1 of 1
Loading...
- Name:
- 2020.alta-1.3.pdf
- Size:
- 1.07 MB
- Format:
- Adobe Portable Document Format
- Description: