Alignment-free sequence comparison for biologically realistic sequences of moderate length
The D2 statistic, defined as the number of matches of words of some pre-specified length k, is a computationally fast alignment-free measure of biological sequence similarity. However there is some debate about its suitability for this purpose as the variability in D2 may be dominated by the terms that reflect the noise in each of the single sequences only. We examine the extent of the problem and the effectiveness of overcoming it by using two mean-centred variants of this statistic, D2* and...[Show more]
|Collections||ANU Research Publications|
|Source:||Statistical Applications in Genetics and Molecular Biology 11.1 (2012):1-28|
|Burden et al Alignment-free sequence 2012.pdf||1.46 MB||Adobe PDF|
Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.