# An efficient Z-score algorithm for assessing sequence alignments.

## Date

2004

## Authors

Booth, Hilary

Maindonald, John

Wilson, Susan

Gready, Jill

## Journal Title

## Journal ISSN

## Volume Title

## Publisher

Mary Ann Liebert Inc.

## Abstract

We describe an alternative method for scoring of the pairwise alignment of two biological sequences. Designed to overcome the bias due to the composition of the alignment, it measures the distance (in standard deviations) between the given alignment and the mean value of all other alignments that can be obtained by a permutation of either sequence. We demonstrate that the standard deviation can be calculated efficiently. By concentrating upon the ungapped case, the mean and standard deviation can be calculated exactly and in two steps, the first being O (N) time, where N is the length of the sequence, the second in a fixed number of calculations, i.e., in O (1) time. We argue that this statistic is a more consistent measure than a similarity score based upon a standard scoring matrix. Even in the ungapped case, the statistic proves in many cases to be more accurate than the commonly used (FASTA) (Pearson and Lipman, 1988) gapped Z-score in which the sequence is matched against a random sample of the database. We demonstrate the use of the POZ-score as a secondary filter which screens out several well-known types of false positive, reducing the amount of manual screening to be done by the biologist.

## Description

## Keywords

Keywords: protein; algorithm; amino acid sequence; analytic method; article; calculation; intermethod comparison; mathematical genetics; priority journal; protein structure; scoring system; sequence alignment; sequence analysis; statistical analysis; Algorithms; Am Dynamic programming; Sequence alignment; Sequence composition; Similarity search; Z-score

## Citation

## Collections

## Source

Journal of Computational Biology

## Type

Journal article

## Book Title

## Entity type

## Access Statement

## License Rights

## DOI

10.1089/cmb.2004.11.616