Approximate word matches between two random sequences
Given two sequences over a finite alphabet L, the D₂ statistic is the number of m-letter word matches between the two sequences. This statistic is used in bioinformatics for expressed sequence tag database searches. Here we study a generalization of the D₂ statistic in the context of DNA sequences, under the assumption of strand symmetric Bernoulli text. For k<m, we look at the count of m-letter word matches with up to k mismatches. For this statistic, we compute the expectation, give...[Show more]
|Collections||ANU Research Publications|
|Source:||The Annals of Applied Probability|
|Access Rights:||Open Access|
|01_Burden_Approximate_word_matches_2008.pdf||Published Version||411.81 kB||Adobe PDF|
Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.