Skip navigation
Skip navigation

Approximate word matches between two random sequences

Burden, Conrad J.; Kantorovitz, Miriam R.; Wilson, Susan R.


Given two sequences over a finite alphabet L, the D₂ statistic is the number of m-letter word matches between the two sequences. This statistic is used in bioinformatics for expressed sequence tag database searches. Here we study a generalization of the D₂ statistic in the context of DNA sequences, under the assumption of strand symmetric Bernoulli text. For k<m, we look at the count of m-letter word matches with up to k mismatches. For this statistic, we compute the expectation, give...[Show more]

CollectionsANU Research Publications
Date published: 2008
Type: Journal article
Source: The Annals of Applied Probability
DOI: 10.1214/07-AAP452


File Description SizeFormat Image
01_Burden_Approximate_word_matches_2008.pdfPublished Version411.81 kBAdobe PDFThumbnail

Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  20 July 2017/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator