On authorship attribution via Markov chains and sequence kernels
| dc.contributor.author | Sanderson, Conrad | |
| dc.contributor.author | Guenter, Simon | |
| dc.coverage.spatial | Hong Kong | |
| dc.date.accessioned | 2015-12-07T22:53:56Z | |
| dc.date.created | August 20-24 2006 | |
| dc.date.issued | 2006 | |
| dc.date.updated | 2015-12-07T12:45:03Z | |
| dc.description.abstract | We investigate the use of recently proposed character and word sequence kernels for the task of authorship attribution and compare their performance with two probabilistic approaches based on Markov chains of characters and words. Several configurations of the sequence kernels are studied using a relatively large dataset, where each author covered several topics. Utilising Moffat smoothing, the two probabilistic approaches obtain similar performance, which in turn is comparable to that of character sequence kernels and is better than that of word sequence kernels. The results further suggest that when using a realistic setup that takes into account the case of texts which are not written by any hypothesised authors, about 5000 reference words are required to obtain good discrimination performance. | |
| dc.identifier.isbn | 0769525210 | |
| dc.identifier.uri | http://hdl.handle.net/1885/27940 | |
| dc.publisher | Institute of Electrical and Electronics Engineers (IEEE Inc) | |
| dc.relation.ispartofseries | International Conference on Pattern Recognition (ICPR 2006) | |
| dc.source | Proceedings of the 18th International Conference on Pattern Recognition | |
| dc.source.uri | http://ieeexplore.ieee.org/iel5/11159/35817/01698811.pdf?isnumber=35817&prod=CNF&http://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=35817&isYear=2006 | |
| dc.subject | Keywords: Database systems; Optical character recognition; Probabilistic logics; Word processing; Character sequence kernels; Datasets; Word sequence kernels; Markov processes | |
| dc.title | On authorship attribution via Markov chains and sequence kernels | |
| dc.type | Conference paper | |
| local.bibliographicCitation.lastpage | 440 | |
| local.bibliographicCitation.startpage | 437 | |
| local.contributor.affiliation | Sanderson, Conrad, College of Engineering and Computer Science, ANU | |
| local.contributor.affiliation | Guenter, Simon , College of Engineering and Computer Science, ANU | |
| local.contributor.authoruid | Sanderson, Conrad, a193340 | |
| local.contributor.authoruid | Guenter, Simon , a235706 | |
| local.description.embargo | 2037-12-31 | |
| local.description.notes | Imported from ARIES | |
| local.description.refereed | Yes | |
| local.identifier.absfor | 080109 - Pattern Recognition and Data Mining | |
| local.identifier.ariespublication | u8803936xPUB54 | |
| local.identifier.doi | 10.1109/ICPR.2006.899 | |
| local.identifier.scopusID | 2-s2.0-34147123127 | |
| local.type.status | Published Version |
Downloads
Original bundle
1 - 3 of 3
Loading...
- Name:
- 01_Sanderson_On_authorship_attribution_via_2006.pdf
- Size:
- 97.58 KB
- Format:
- Adobe Portable Document Format
Loading...
- Name:
- 02_Sanderson_On_authorship_attribution_via_2006.pdf
- Size:
- 166.17 KB
- Format:
- Adobe Portable Document Format
Loading...
- Name:
- 03_Sanderson_On_authorship_attribution_via_2006.pdf
- Size:
- 128.74 KB
- Format:
- Adobe Portable Document Format