Skip navigation
Skip navigation

Word match counts between markovian biological sequences

Burden, Conrad; Leopardi, Paul; Foret, Sylvain

Description

The D2 statistic, which counts the number of word matches between two given sequences, has long been proposed as a measure of similarity for biological sequences. Much of the mathematically rigorous work carried out to date on the properties of the D2 statistic has been restricted to the case of ‘Bernoulli’ sequences composed of identically and independently distributed letters. Here the properties of the distribution of this statistic for the biologically more realistic case of Markovian...[Show more]

dc.contributor.authorBurden, Conrad
dc.contributor.authorLeopardi, Paul
dc.contributor.authorForet, Sylvain
dc.date.accessioned2015-12-10T22:30:37Z
dc.identifier.issn9876-5432
dc.identifier.urihttp://hdl.handle.net/1885/55170
dc.description.abstractThe D2 statistic, which counts the number of word matches between two given sequences, has long been proposed as a measure of similarity for biological sequences. Much of the mathematically rigorous work carried out to date on the properties of the D2 statistic has been restricted to the case of ‘Bernoulli’ sequences composed of identically and independently distributed letters. Here the properties of the distribution of this statistic for the biologically more realistic case of Markovian sequences is studied. The approach is novel in that Markovian dependency is defined for sequences with periodic boundary conditions, and this enables exact analytic formulae for the mean and variance to be derived. The formulae are confirmed using numerical simulations, and asymptotic approximations to the full distribution are tested.
dc.publisherZZZ Scopus Publisher
dc.sourceCommunications in Computer and Information Science
dc.titleWord match counts between markovian biological sequences
dc.typeJournal article
local.description.notesImported from ARIES
local.description.refereedYes
dc.date.issued2014
local.identifier.absfor010300 - NUMERICAL AND COMPUTATIONAL MATHEMATICS
local.identifier.absfor010400 - STATISTICS
local.identifier.absfor060409 - Molecular Evolution
local.identifier.ariespublicationa383154xPUB321
local.type.statusPublished Version
local.contributor.affiliationBurden, Conrad, College of Physical and Mathematical Sciences, ANU
local.contributor.affiliationLeopardi, Paul, College of Physical and Mathematical Sciences, ANU
local.contributor.affiliationForet, Sylvain, College of Medicine, Biology and Environment, ANU
local.description.embargo2037-12-31
local.bibliographicCitation.startpage147
local.bibliographicCitation.lastpage161
local.identifier.doi10.1007/978-3-662-44485-6_11
dc.date.updated2016-02-24T08:07:12Z
local.identifier.scopusID2-s2.0-84916241399
CollectionsANU Research Publications

Download

File Description SizeFormat Image
01_Burden_Word_match_counts_between_2014.pdf643.13 kBAdobe PDF    Request a copy


Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  19 May 2020/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator