Mahalanobis distance with an adapted within-author covariance matrix: An authorship verification experiment
| dc.contributor.author | Ishihara, Shunichi | |
| dc.date.accessioned | 2024-04-22T01:55:19Z | |
| dc.date.available | 2024-04-22T01:55:19Z | |
| dc.date.issued | 2022 | |
| dc.date.updated | 2022-12-25T07:16:16Z | |
| dc.description.abstract | The rotated delta, which is argued to be a theoretically better-grounded distance measure, has failed to receive any empirical support for its superiority. This study revisits the rotated delta-which is more commonly known as the Mahalanobis distance in other areas-with two different covariance matrices that are estimated from training data. The first covariance matrix represents the between-author variability, and the second the within-author variability. A series of likelihood ratio-based authorship verification experiments was carried out with some different distance measures. The experiments made use of the documents arranged from a large database of text messages that allowed for a total of 2,160 same-author and 4,663,440 different-author comparisons. The Mahalanobis distance with the between-author covariance matrix performed far worse compared to the other distance measures, whereas the Mahalanobis distance with the within-author covariance matrix performed better than the other measures. However, superior performance relative to the cosine distance is subject to word lengths and/or the order of the feature vector. The result of follow-up experiments further illustrated that the covariance matrix representing the within-author variability needs to be trained using a good amount of data to perform better than the cosine distance: the higher the order of the vector, the more data are required for training. The quantitative results also infer that the two sources of variabilities-notably within- and between-author variabilities-are independent of each other to the extent that the latter cannot accurately approximate the former. | en_AU |
| dc.format.mimetype | application/pdf | en_AU |
| dc.identifier.issn | 2055-7671 | en_AU |
| dc.identifier.uri | http://hdl.handle.net/1885/316957 | |
| dc.language.iso | en_AU | en_AU |
| dc.publisher | Oxford University Press | en_AU |
| dc.rights | © 2022 The authors | en_AU |
| dc.source | Digital Scholarship in the Humanities | en_AU |
| dc.title | Mahalanobis distance with an adapted within-author covariance matrix: An authorship verification experiment | en_AU |
| dc.type | Journal article | en_AU |
| local.bibliographicCitation.issue | 4 | en_AU |
| local.bibliographicCitation.lastpage | 1072 | en_AU |
| local.bibliographicCitation.startpage | 1051 | en_AU |
| local.contributor.affiliation | Ishihara, Shunichi, College of Asia and the Pacific, ANU | en_AU |
| local.contributor.authoruid | Ishihara, Shunichi, u9504440 | en_AU |
| local.description.notes | Imported from ARIES | en_AU |
| local.identifier.absfor | 460404 - Digital forensics | en_AU |
| local.identifier.absfor | 470403 - Computational linguistics | en_AU |
| local.identifier.absfor | 460208 - Natural language processing | en_AU |
| local.identifier.absseo | 220301 - Digital humanities | en_AU |
| local.identifier.absseo | 220402 - Applied computing | en_AU |
| local.identifier.absseo | 130202 - Languages and linguistics | en_AU |
| local.identifier.ariespublication | a383154xPUB27802 | en_AU |
| local.identifier.citationvolume | 37 | en_AU |
| local.identifier.doi | 10.1093/llc/fqac008 | en_AU |
| local.identifier.thomsonID | 000768378800001 | |
| local.publisher.url | https://academic.oup.com/ | en_AU |
| local.type.status | Published Version | en_AU |
Downloads
Original bundle
1 - 1 of 1