Skip navigation
Skip navigation

Statistical considerations underpinning an alignment-free sequence comparison method

Jing, Junmei; Burden, Conrad; Foret, Sylvain; Wilson, Susan

Description

The D2 statistic is defined as the number of word matches of prespecified length k, with up to t mismatches, shared between two given sequences. This statistic finds its application in alignment-free comparisons of biological sequences. It has two main advantages over alignment-based methods for nucleotide and amino-acid sequence comparisons, such as BLAST (basic local alignment search tool). These are (i) D2 does not assume that homologous segments are contiguous, and (ii) the algorithm is...[Show more]

dc.contributor.authorJing, Junmei
dc.contributor.authorBurden, Conrad
dc.contributor.authorForet, Sylvain
dc.contributor.authorWilson, Susan
dc.date.accessioned2015-12-08T22:44:41Z
dc.identifier.issn1226-3192
dc.identifier.urihttp://hdl.handle.net/1885/37507
dc.description.abstractThe D2 statistic is defined as the number of word matches of prespecified length k, with up to t mismatches, shared between two given sequences. This statistic finds its application in alignment-free comparisons of biological sequences. It has two main advantages over alignment-based methods for nucleotide and amino-acid sequence comparisons, such as BLAST (basic local alignment search tool). These are (i) D2 does not assume that homologous segments are contiguous, and (ii) the algorithm is computationally extremely fast, the runtime being proportional to the size of the sequences in the case of exact matches. This review article summarises results to date on determining the distributional properties of the D2 statistic for a range of biologically relevant parameters, describes existing applications of the method, and outlines future research directions.
dc.publisherElsevier
dc.sourceJournal of the Korean statistical society
dc.subjectKeywords: D2 distribution; D2 statistics; K-word match; Sequence comparison
dc.titleStatistical considerations underpinning an alignment-free sequence comparison method
dc.typeJournal article
local.description.notesImported from ARIES
local.identifier.citationvolume39
dc.date.issued2010
local.identifier.absfor010405 - Statistical Theory
local.identifier.absfor060404 - Epigenetics (incl. Genome Methylation and Epigenomics)
local.identifier.absfor060408 - Genomics
local.identifier.ariespublicationf2965xPUB150
local.identifier.ariespublicationu9511635xPUB1219
local.type.statusPublished Version
local.contributor.affiliationJing, Junmei, College of Physical and Mathematical Sciences, ANU
local.contributor.affiliationBurden, Conrad, College of Physical and Mathematical Sciences, ANU
local.contributor.affiliationForet, Sylvain, James Cook University
local.contributor.affiliationWilson, Susan, College of Physical and Mathematical Sciences, ANU
local.description.embargo2037-12-31
local.bibliographicCitation.issue3
local.bibliographicCitation.startpage325
local.bibliographicCitation.lastpage335
local.identifier.doi10.1016/j.jkss.2010.02.009
local.identifier.absseo970106 - Expanding Knowledge in the Biological Sciences
dc.date.updated2016-02-24T08:14:36Z
local.identifier.scopusID2-s2.0-77955272209
local.identifier.thomsonID000281293600008
CollectionsANU Research Publications

Download

File Description SizeFormat Image
01_Jing_Statistical_considerations_2010.pdf827.58 kBAdobe PDF    Request a copy


Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  19 May 2020/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator