Cultural advice

The Australian National University acknowledges, celebrates and pays our respects to the Ngunnawal and Ngambri people of the Canberra region and to all First Nations Australians on whose traditional lands we meet and work, and whose cultures are among the oldest continuing cultures in human history.

Aboriginal and Torres Strait Islander peoples are advised that ANU Library collections may include images, names, voices, and other representations of deceased persons.

Material in the collection may contain terms, language or views that reflect the period in which the item was created and may be considered inappropriate today.

Approximate word matches between two random sequences

Loading...
Thumbnail Image

Date

Authors

Burden, Conrad J
Kantorovitz, Miriam R
Wilson, Susan R

Journal Title

Journal ISSN

Volume Title

Publisher

Institute of Mathematical Statistics

Abstract

Given two sequences over a finite alphabet L, the D₂ statistic is the number of m-letter word matches between the two sequences. This statistic is used in bioinformatics for expressed sequence tag database searches. Here we study a generalization of the D₂ statistic in the context of DNA sequences, under the assumption of strand symmetric Bernoulli text. For k<m, we look at the count of m-letter word matches with up to k mismatches. For this statistic, we compute the expectation, give upper and lower bounds for the variance and prove its distribution is asymptotically normal.

Description

Citation

Source

The Annals of Applied Probability

Book Title

Entity type

Access Statement

Open Access

License Rights

Restricted until

Downloads

abcd