Skip navigation
Skip navigation

Mind the gaps: evidence of bias in estimates of multiple sequence alignments

Golubchik, Tanya; Wise, Michael J; Easteal, Simon; Jermiin, Lars Sommer

Description

Multiple sequence alignment (MSA) is a crucial first step in the analysis of genomic and proteomic data. Commonly occurring sequence features, such as deletions and insertions, are known to affect the accuracy of MSA programs, but the extent to which alignment accuracy is affected by the positions of insertions and deletions has not been examined independently of other sources of sequence variation. We assessed the performance of 6 popular MSA programs (ClustalW, DIALIGN-T, MAFFT, MUSCLE,...[Show more]

dc.contributor.authorGolubchik, Tanya
dc.contributor.authorWise, Michael J
dc.contributor.authorEasteal, Simon
dc.contributor.authorJermiin, Lars Sommer
dc.date.accessioned2015-12-07T22:53:42Z
dc.identifier.issn0737-4038
dc.identifier.urihttp://hdl.handle.net/1885/27845
dc.description.abstractMultiple sequence alignment (MSA) is a crucial first step in the analysis of genomic and proteomic data. Commonly occurring sequence features, such as deletions and insertions, are known to affect the accuracy of MSA programs, but the extent to which alignment accuracy is affected by the positions of insertions and deletions has not been examined independently of other sources of sequence variation. We assessed the performance of 6 popular MSA programs (ClustalW, DIALIGN-T, MAFFT, MUSCLE, PROBCONS, and T-COFFEE) and one experimental program, PRANK, on amino acid sequences that differed only by short regions of deleted residues. The analysis showed that the absence of residues often led to an incorrect placement of gaps in the alignments, even though the sequences were otherwise identical. In data sets containing sequences with partially overlapping deletions, most MSA programs preferentially aligned the gaps vertically at the expense of incorrectly aligning residues in the flanking regions. Of the programs assessed, only DIALIGN-T was able to place overlapping gaps correctly relative to one another, but this was usually context dependent and was observed only in some of the data sets. In data sets containing sequences with non-overlapping deletions, both DIALIGN-T and MAFFT (G-INS-I) were able to align gaps with near-perfect accuracy, but only MAFFT produced the correct alignment consistently. The same was true for data sets that comprised isoforms of alternatively spliced gene products: both DIALIGN-T and MAFFT produced highly accurate alignments, with MAFFT being the more consistent of the 2 programs. Other programs, notably T-COFFEE and ClustalW, were less accurate. For all data sets, alignments produced by different MSA programs differed markedly, indicating that reliance on a single MSA program may give misleading results. It is therefore advisable to use more than one MSA program when dealing with sequences that may contain deletions or insertions, particularly for high-throughput and pipeline applications where manual refinement of each alignment is not practicable.
dc.publisherSociety for Molecular Biology Evolution
dc.sourceMolecular Biology and Evolution
dc.subjectKeywords: gene product; accuracy; amino acid sequence; article; DNA flanking region; gene deletion; gene insertion; genetic analysis; human; multiple sequence alignment; nonhuman; nucleotide sequence; Amino Acid Sequence; Computational Biology; Computer Simulation; ClustalW; DIALIGN-T; MAFFT; Multiple sequence alignment; MUSCLE; PROBCONS; T-COFFEE
dc.titleMind the gaps: evidence of bias in estimates of multiple sequence alignments
dc.typeJournal article
local.description.notesImported from ARIES
local.identifier.citationvolume24
dc.date.issued2007
local.identifier.absfor060409 - Molecular Evolution
local.identifier.ariespublicationu4020362xPUB54
local.type.statusPublished Version
local.contributor.affiliationGolubchik, Tanya, University of Sydney
local.contributor.affiliationWise, Michael J, University of Western Australia
local.contributor.affiliationEasteal, Simon, College of Medicine, Biology and Environment, ANU
local.contributor.affiliationJermiin, Lars Sommer, University of Sydney
local.description.embargo2037-12-31
local.bibliographicCitation.issue11
local.bibliographicCitation.startpage2433
local.bibliographicCitation.lastpage42
local.identifier.doi10.1093/molbev/msm176
dc.date.updated2015-12-07T12:42:00Z
local.identifier.scopusID2-s2.0-35848948397
CollectionsANU Research Publications

Download

File Description SizeFormat Image
01_Golubchik_Mind_the_gaps:_evidence_of_2007.pdf939.84 kBAdobe PDF    Request a copy


Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  17 November 2022/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator