Asymptotic Behaviour of k -Word Matches Between Two Uniformly Distributed Sequences

Kantorovitz, MiriamBooth, HilaryBurden, ConradWilson, Susan2015-12-070021-9002http://hdl.handle.net/1885/22075Given two sequences of length n over a finite alphabet A of size \A\ = d, the D2 statistic is the number of k-letter word matches between the two sequences. This statistic is used in bioinformatics for EST sequence database searches. Under the assumptionKeywords: Count vector; k-word matches; Sequence comparison; Stein's methodAsymptotic Behaviour of k -Word Matches Between Two Uniformly Distributed Sequences200710.1239/jap/11897175452015-12-07