Statistical considerations underpinning an alignment-free sequence comparison method
-
Altmetric Citations
Jing, Junmei; Burden, Conrad; Foret, Sylvain; Wilson, Susan
Description
The D2 statistic is defined as the number of word matches of prespecified length k, with up to t mismatches, shared between two given sequences. This statistic finds its application in alignment-free comparisons of biological sequences. It has two main advantages over alignment-based methods for nucleotide and amino-acid sequence comparisons, such as BLAST (basic local alignment search tool). These are (i) D2 does not assume that homologous segments are contiguous, and (ii) the algorithm is...[Show more]
dc.contributor.author | Jing, Junmei | |
---|---|---|
dc.contributor.author | Burden, Conrad | |
dc.contributor.author | Foret, Sylvain | |
dc.contributor.author | Wilson, Susan | |
dc.date.accessioned | 2015-12-08T22:44:41Z | |
dc.identifier.issn | 1226-3192 | |
dc.identifier.uri | http://hdl.handle.net/1885/37507 | |
dc.description.abstract | The D2 statistic is defined as the number of word matches of prespecified length k, with up to t mismatches, shared between two given sequences. This statistic finds its application in alignment-free comparisons of biological sequences. It has two main advantages over alignment-based methods for nucleotide and amino-acid sequence comparisons, such as BLAST (basic local alignment search tool). These are (i) D2 does not assume that homologous segments are contiguous, and (ii) the algorithm is computationally extremely fast, the runtime being proportional to the size of the sequences in the case of exact matches. This review article summarises results to date on determining the distributional properties of the D2 statistic for a range of biologically relevant parameters, describes existing applications of the method, and outlines future research directions. | |
dc.publisher | Elsevier | |
dc.source | Journal of the Korean statistical society | |
dc.subject | Keywords: D2 distribution; D2 statistics; K-word match; Sequence comparison | |
dc.title | Statistical considerations underpinning an alignment-free sequence comparison method | |
dc.type | Journal article | |
local.description.notes | Imported from ARIES | |
local.identifier.citationvolume | 39 | |
dc.date.issued | 2010 | |
local.identifier.absfor | 010405 - Statistical Theory | |
local.identifier.absfor | 060404 - Epigenetics (incl. Genome Methylation and Epigenomics) | |
local.identifier.absfor | 060408 - Genomics | |
local.identifier.ariespublication | f2965xPUB150 | |
local.identifier.ariespublication | u9511635xPUB1219 | |
local.type.status | Published Version | |
local.contributor.affiliation | Jing, Junmei, College of Physical and Mathematical Sciences, ANU | |
local.contributor.affiliation | Burden, Conrad, College of Physical and Mathematical Sciences, ANU | |
local.contributor.affiliation | Foret, Sylvain, James Cook University | |
local.contributor.affiliation | Wilson, Susan, College of Physical and Mathematical Sciences, ANU | |
local.description.embargo | 2037-12-31 | |
local.bibliographicCitation.issue | 3 | |
local.bibliographicCitation.startpage | 325 | |
local.bibliographicCitation.lastpage | 335 | |
local.identifier.doi | 10.1016/j.jkss.2010.02.009 | |
local.identifier.absseo | 970106 - Expanding Knowledge in the Biological Sciences | |
dc.date.updated | 2016-02-24T08:14:36Z | |
local.identifier.scopusID | 2-s2.0-77955272209 | |
local.identifier.thomsonID | 000281293600008 | |
Collections | ANU Research Publications |
Download
File | Description | Size | Format | Image |
---|---|---|---|---|
01_Jing_Statistical_considerations_2010.pdf | 827.58 kB | Adobe PDF | Request a copy |
Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.
Updated: 19 May 2020/ Responsible Officer: University Librarian/ Page Contact: Library Systems & Web Coordinator