PyEvolve: a toolkit for statistical modelling of molecular evolution

dc.contributor.authorButterfield, Andrew
dc.contributor.authorLang, Edward
dc.contributor.authorLawrence, Catherine
dc.contributor.authorWakefield, Matthew
dc.contributor.authorIsaev, Alexander
dc.contributor.authorHuttley, Gavin Austin
dc.contributor.authorVedagiri, Vivek
dc.date.accessioned2009-04-23T00:49:43Zen_US
dc.date.accessioned2010-12-20T06:05:43Z
dc.date.available2009-04-23T00:49:43Zen_US
dc.date.available2010-12-20T06:05:43Z
dc.date.issued2004-01-05en_US
dc.date.updated2015-12-12T08:10:04Z
dc.description.abstractBACKGROUND: Examining the distribution of variation has proven an extremely profitable technique in the effort to identify sequences of biological significance. Most approaches in the field, however, evaluate only the conserved portions of sequences – ignoring the biological significance of sequence differences. A suite of sophisticated likelihood based statistical models from the field of molecular evolution provides the basis for extracting the information from the full distribution of sequence variation. The number of different problems to which phylogeny-based maximum likelihood calculations can be applied is extensive. Available software packages that can perform likelihood calculations suffer from a lack of flexibility and scalability, or employ error-prone approaches to model parameterisation. RESULTS: Here we describe the implementation of PyEvolve, a toolkit for the application of existing, and development of new, statistical methods for molecular evolution. We present the object architecture and design schema of PyEvolve, which includes an adaptable multi-level parallelisation schema. The approach for defining new methods is illustrated by implementing a novel dinucleotide model of substitution that includes a parameter for mutation of methylated CpG's, which required 8 lines of standard Python code to define. Benchmarking was performed using either a dinucleotide or codon substitution model applied to an alignment of BRCA1 sequences from 20 mammals, or a 10 species subset. Up to five-fold parallel performance gains over serial were recorded. Compared to leading alternative software, PyEvolve exhibited significantly better real world performance for parameter rich models with a large data set, reducing the time required for optimisation from ~10 days to ~6 hours. CONCLUSION: PyEvolve provides flexible functionality that can be used either for statistical modelling of molecular evolution, or the development of new methods in the field. The toolkit can be used interactively or by writing and executing scripts. The toolkit uses efficient processes for specifying the parameterisation of statistical models, and implements numerous optimisations that make highly parameter rich likelihood functions solvable within hours on multi-cpu hardware. PyEvolve can be readily adapted in response to changing computational demands and hardware configurations to maximise performance. PyEvolve is released under the GPL and can be downloaded from http://cbis.anu.edu.au/software webcite.
dc.format12 pages
dc.identifier.citationBMC Bioinformatics 5:1 (2004)
dc.identifier.issn1471-2105en_US
dc.identifier.urihttp://hdl.handle.net/10440/126en_US
dc.identifier.urihttp://digitalcollections.anu.edu.au/handle/10440/126
dc.publisherBioMed Central
dc.rightsThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
dc.sourceBMC Bioinformatics
dc.source.urihttp://www.biomedcentral.com/content/pdf/1471-2105-5-1.pdfen_US
dc.source.urihttp://www.biomedcentral.com/1471-2105/5/1en_US
dc.subjectKeywords: Biological significance; Codon substitutions; Computational demands; Hardware configurations; Likelihood functions; Parallel performance; Real-world performance; Statistical modelling; Hardware; Nucleotides; Optimization; Statistical methods; Molecular bi
dc.titlePyEvolve: a toolkit for statistical modelling of molecular evolution
dc.typeJournal article
dcterms.dateAccepted2004-01-05en_US
local.bibliographicCitation.issue1
local.bibliographicCitation.lastpage12
local.bibliographicCitation.startpage1
local.contributor.affiliationButterfield, Andrew, Research School of Pacific and Asian Studies, Department of Archaeology and Natural Historyen_US
local.contributor.affiliationLang, Edward, Faculty of Scienceen_US
local.contributor.affiliationLawrence, Catherine, John Curtin School of Medical Researchen_US
local.contributor.affiliationWakefield, Matthew, Research School of Biological Sciences, Comparative Genomics Research Groupen_US
local.contributor.affiliationIsaev, Alexander, Faculty of Science, Department of Mathematics (Division of MSI)en_US
local.contributor.affiliationHuttley, Gavin Austin, John Curtin School of Medical Research, Division of Molecular Bioscienceen_US
local.contributor.affiliationVedagiri, Vivek, HeliXense Pty Ltden_US
local.contributor.authoruidU9315096en_US
local.contributor.authoruidU4027355en_US
local.contributor.authoruidU4044780en_US
local.contributor.authoruidU4021820en_US
local.contributor.authoruidU9208582en_US
local.contributor.authoruidU9800703en_US
local.contributor.authoruidE19363en_US
local.description.refereedYes
local.identifier.absfor060409en_US
local.identifier.ariespublicationMigratedxPub15124en_US
local.identifier.citationvolume5
local.identifier.doi10.1186/1471-2105-5-1
local.identifier.scopusID2-s2.0-2942571419
local.type.statusPublished Versionen_US

Downloads

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Butterfield_PyEvolve2004.pdf
Size:
332.27 KB
Format:
Adobe Portable Document Format