Approximate word matches between two random sequences
Loading...
Date
Authors
Burden, Conrad J
Kantorovitz, Miriam R
Wilson, Susan R
Journal Title
Journal ISSN
Volume Title
Publisher
Institute of Mathematical Statistics
Abstract
Given two sequences over a finite alphabet L, the D₂ statistic is the
number of m-letter word matches between the two sequences. This statistic
is used in bioinformatics for expressed sequence tag database searches.
Here we study a generalization of the D₂ statistic in the context of DNA sequences,
under the assumption of strand symmetric Bernoulli text. For k<m,
we look at the count of m-letter word matches with up to k mismatches. For
this statistic, we compute the expectation, give upper and lower bounds for
the variance and prove its distribution is asymptotically normal.
Description
Citation
Collections
Source
The Annals of Applied Probability
Type
Book Title
Entity type
Access Statement
Open Access
License Rights
Restricted until
Downloads
File
Description
Published Version