Statistical considerations underpinning an alignment-free sequence comparison method

Jing, Junmei; Burden, Conrad; Foret, Sylvain; Wilson, Susan

Statistical considerations underpinning an alignment-free sequence comparison method

Date

2010

Authors

Jing, Junmei

Burden, Conrad

Foret, Sylvain

Wilson, Susan

Publisher

Elsevier

Abstract

The D2 statistic is defined as the number of word matches of prespecified length k, with up to t mismatches, shared between two given sequences. This statistic finds its application in alignment-free comparisons of biological sequences. It has two main advantages over alignment-based methods for nucleotide and amino-acid sequence comparisons, such as BLAST (basic local alignment search tool). These are (i) D2 does not assume that homologous segments are contiguous, and (ii) the algorithm is computationally extremely fast, the runtime being proportional to the size of the sequences in the case of exact matches. This review article summarises results to date on determining the distributional properties of the D2 statistic for a range of biologically relevant parameters, describes existing applications of the method, and outlines future research directions.

Keywords

Keywords: D2 distribution; D2 statistics; K-word match; Sequence comparison

URI

http://hdl.handle.net/1885/37507

Collections

ANU Research Publications

Source

Journal of the Korean statistical society

Type

Journal article

DOI

10.1016/j.jkss.2010.02.009

Restricted until

2037-12-31

Downloads

File

Description

01_Jing_Statistical_considerations_2010.pdf (827.58 KB)

Full item page

Statistical considerations underpinning an alignment-free sequence comparison method

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Source

Type

Book Title

Entity type

Access Statement

License Rights

DOI

Restricted until

Downloads