Mahalanobis distance with an adapted within-author covariance matrix: An authorship verification experiment

dc.contributor.authorIshihara, Shunichi
dc.date.accessioned2024-04-22T01:55:19Z
dc.date.available2024-04-22T01:55:19Z
dc.date.issued2022
dc.date.updated2022-12-25T07:16:16Z
dc.description.abstractThe rotated delta, which is argued to be a theoretically better-grounded distance measure, has failed to receive any empirical support for its superiority. This study revisits the rotated delta-which is more commonly known as the Mahalanobis distance in other areas-with two different covariance matrices that are estimated from training data. The first covariance matrix represents the between-author variability, and the second the within-author variability. A series of likelihood ratio-based authorship verification experiments was carried out with some different distance measures. The experiments made use of the documents arranged from a large database of text messages that allowed for a total of 2,160 same-author and 4,663,440 different-author comparisons. The Mahalanobis distance with the between-author covariance matrix performed far worse compared to the other distance measures, whereas the Mahalanobis distance with the within-author covariance matrix performed better than the other measures. However, superior performance relative to the cosine distance is subject to word lengths and/or the order of the feature vector. The result of follow-up experiments further illustrated that the covariance matrix representing the within-author variability needs to be trained using a good amount of data to perform better than the cosine distance: the higher the order of the vector, the more data are required for training. The quantitative results also infer that the two sources of variabilities-notably within- and between-author variabilities-are independent of each other to the extent that the latter cannot accurately approximate the former.en_AU
dc.format.mimetypeapplication/pdfen_AU
dc.identifier.issn2055-7671en_AU
dc.identifier.urihttp://hdl.handle.net/1885/316957
dc.language.isoen_AUen_AU
dc.publisherOxford University Pressen_AU
dc.rights© 2022 The authorsen_AU
dc.sourceDigital Scholarship in the Humanitiesen_AU
dc.titleMahalanobis distance with an adapted within-author covariance matrix: An authorship verification experimenten_AU
dc.typeJournal articleen_AU
local.bibliographicCitation.issue4en_AU
local.bibliographicCitation.lastpage1072en_AU
local.bibliographicCitation.startpage1051en_AU
local.contributor.affiliationIshihara, Shunichi, College of Asia and the Pacific, ANUen_AU
local.contributor.authoruidIshihara, Shunichi, u9504440en_AU
local.description.notesImported from ARIESen_AU
local.identifier.absfor460404 - Digital forensicsen_AU
local.identifier.absfor470403 - Computational linguisticsen_AU
local.identifier.absfor460208 - Natural language processingen_AU
local.identifier.absseo220301 - Digital humanitiesen_AU
local.identifier.absseo220402 - Applied computingen_AU
local.identifier.absseo130202 - Languages and linguisticsen_AU
local.identifier.ariespublicationa383154xPUB27802en_AU
local.identifier.citationvolume37en_AU
local.identifier.doi10.1093/llc/fqac008en_AU
local.identifier.thomsonID000768378800001
local.publisher.urlhttps://academic.oup.com/en_AU
local.type.statusPublished Versionen_AU

Downloads

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
fqac008.pdf
Size:
870.51 KB
Format:
Adobe Portable Document Format
Description: