Pitfalls of the most commonly used models of context dependent substitution

dc.contributor.authorLindsay, Helen
dc.contributor.authorYing, Hua
dc.contributor.authorHuttley, Gavin Austin
dc.contributor.authorYap, Von Bing
dc.date.accessioned2009-04-21T06:26:25Zen_US
dc.date.accessioned2010-12-20T06:03:48Z
dc.date.available2009-04-21T06:26:25Zen_US
dc.date.available2010-12-20T06:03:48Z
dc.date.issued2008-12-16en_US
dc.date.updated2016-02-24T10:26:42Z
dc.description.abstractBACKGROUND: Neighboring nucleotides exert a striking influence on mutation, with the hypermutability of CpG dinucleotides in many genomes being an exemplar. Among the approaches employed to measure the relative importance of sequence neighbors on molecular evolution have been continuous-time Markov process models for substitutions that treat sequences as a series of independent tuples. The most widely used examples are the codon substitution models. We evaluated the suitability of derivatives of the nucleotide frequency weighted (hereafter NF) and tuple frequency weighted (hereafter TF) models for measuring sequence context dependent substitution. Critical properties we address are their relationships to an independent nucleotide process and the robustness of parameter estimation to changes in sequence composition. We then consider the impact on inference concerning dinucleotide substitution processes from application of these two forms to intron sequence alignments from primates. RESULTS: We prove that the NF form always nests the independent nucleotide process and that this is not true for the TF form. As a consequence, using TF to study context effects can be misleading, which is shown by both theoretical calculations and simulations. We describe a simple example where a context parameter estimated under TF is confounded with composition terms unless all sequence states are equi-frequent. We illustrate this for the dinucleotide case by simulation under a nucleotide model, showing that the TF form identifies a CpG effect when none exists. Our analysis of primate introns revealed that the effect of nucleotide neighbors is over-estimated under TF compared with NF. Parameter estimates for a number of contexts are also strikingly discordant between the two model forms. CONCLUSION: Our results establish that the NF form should be used for analysis of independent-tuple context dependent processes. Although neighboring effects in general are still important, prominent influences such as the elevated CpG transversion rate previously identified using the TF form are an artifact. Our results further suggest as few as 5 parameters may account for ~85% of neighboring nucleotide influence.
dc.format17 pages
dc.identifier.citationBiology Direct 3.52 (2008)
dc.identifier.issn1745-6150en_US
dc.identifier.urihttp://hdl.handle.net/10440/105en_US
dc.identifier.urihttp://digitalcollections.anu.edu.au/handle/10440/105
dc.publisherBioMed Central
dc.rightsThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
dc.sourceBiology Direct
dc.source.urihttp://www.biology-direct.com/content/pdf/1745-6150-3-52.pdfen_US
dc.source.urihttp://www.biology-direct.com/content/3/1/52en_US
dc.subjectKeywords: Primates; nucleotide; amino acid substitution; animal; article; biological model; CpG island; genetics; intron; primate; sequence alignment; statistical model; Amino Acid Substitution; Animals; CpG Islands; Introns; Likelihood Functions; Models, Genetic;
dc.titlePitfalls of the most commonly used models of context dependent substitution
dc.typeJournal article
dcterms.dateAccepted2008-12-16en_US
local.bibliographicCitation.issue52
local.bibliographicCitation.lastpage17
local.bibliographicCitation.startpage1
local.contributor.affiliationLindsay, Helen, Faculty of Scienceen_US
local.contributor.affiliationYing, Hua, John Curtin School of Medical Research, Division of Molecular Bioscienceen_US
local.contributor.affiliationHuttley, Gavin Austin, John Curtin School of Medical Research, Division of Molecular Bioscienceen_US
local.contributor.affiliationYap, Von Bing, National University of Singaporeen_US
local.contributor.authoruidu4105983en_US
local.contributor.authoruidu4281770en_US
local.contributor.authoruidu9800703en_US
local.contributor.authoruidE32799en_US
local.identifier.absfor060409en_US
local.identifier.ariespublicationu4020362xPUB129en_US
local.identifier.citationvolume3
local.identifier.doi10.1186/1745-6150-3-52
local.identifier.scopusID2-s2.0-61349175941
local.identifier.thomsonID000262594300001
local.type.statusPublished Versionen_US

Downloads

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Lindsay_Pitfalls2008.pdf
Size:
1.49 MB
Format:
Adobe Portable Document Format