On authorship attribution via Markov chains and sequence kernels

dc.contributor.authorSanderson, Conrad
dc.contributor.authorGuenter, Simon
dc.coverage.spatialHong Kong
dc.date.accessioned2015-12-07T22:53:56Z
dc.date.createdAugust 20-24 2006
dc.date.issued2006
dc.date.updated2015-12-07T12:45:03Z
dc.description.abstractWe investigate the use of recently proposed character and word sequence kernels for the task of authorship attribution and compare their performance with two probabilistic approaches based on Markov chains of characters and words. Several configurations of the sequence kernels are studied using a relatively large dataset, where each author covered several topics. Utilising Moffat smoothing, the two probabilistic approaches obtain similar performance, which in turn is comparable to that of character sequence kernels and is better than that of word sequence kernels. The results further suggest that when using a realistic setup that takes into account the case of texts which are not written by any hypothesised authors, about 5000 reference words are required to obtain good discrimination performance.
dc.identifier.isbn0769525210
dc.identifier.urihttp://hdl.handle.net/1885/27940
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE Inc)
dc.relation.ispartofseriesInternational Conference on Pattern Recognition (ICPR 2006)
dc.sourceProceedings of the 18th International Conference on Pattern Recognition
dc.source.urihttp://ieeexplore.ieee.org/iel5/11159/35817/01698811.pdf?isnumber=35817&prod=CNF&http://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=35817&isYear=2006
dc.subjectKeywords: Database systems; Optical character recognition; Probabilistic logics; Word processing; Character sequence kernels; Datasets; Word sequence kernels; Markov processes
dc.titleOn authorship attribution via Markov chains and sequence kernels
dc.typeConference paper
local.bibliographicCitation.lastpage440
local.bibliographicCitation.startpage437
local.contributor.affiliationSanderson, Conrad, College of Engineering and Computer Science, ANU
local.contributor.affiliationGuenter, Simon , College of Engineering and Computer Science, ANU
local.contributor.authoruidSanderson, Conrad, a193340
local.contributor.authoruidGuenter, Simon , a235706
local.description.embargo2037-12-31
local.description.notesImported from ARIES
local.description.refereedYes
local.identifier.absfor080109 - Pattern Recognition and Data Mining
local.identifier.ariespublicationu8803936xPUB54
local.identifier.doi10.1109/ICPR.2006.899
local.identifier.scopusID2-s2.0-34147123127
local.type.statusPublished Version

Downloads

Original bundle

Now showing 1 - 3 of 3
Loading...
Thumbnail Image
Name:
01_Sanderson_On_authorship_attribution_via_2006.pdf
Size:
97.58 KB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
02_Sanderson_On_authorship_attribution_via_2006.pdf
Size:
166.17 KB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
03_Sanderson_On_authorship_attribution_via_2006.pdf
Size:
128.74 KB
Format:
Adobe Portable Document Format