Word match counts between Markovian biological sequences

Date

Authors

Burden, Conrad
Leopardi, Paul
Forêt, Sylvain

Journal Title

Journal ISSN

Volume Title

Publisher

Springer Verlag

Access Statement

Research Projects

Organizational Units

Journal Issue

Abstract

The D2 statistic, which counts the number of word matches between two given sequences, has long been proposed as a measure of similarity for biological sequences. Much of the mathematically rigorous work carried out to date on the properties of the D2 statistic has been restricted to the case of ‘Bernoulli’ sequences composed of identically and independently distributed letters. Here the properties of the distribution of this statistic for the biologically more realistic case of Markovian sequences is studied. The approach is novel in that Markovian dependency is defined for sequences with periodic boundary conditions, and this enables exact analytic formulae for the mean and variance to be derived. The formulae are confirmed using numerical simulations, and asymptotic approximations to the full distribution are tested.

Description

Citation

Source

Book Title

Biomedical Engineering Systems and Technologies - 6th International Joint Conference, BIOSTEC 2013, Revised Selected Papers

Entity type

Publication

Access Statement

License Rights

Restricted until