Skip navigation
Skip navigation

Automatic selection of partitioning schemes for phylogenetic analyses using iterative k-means clustering of site rates

Frandsen, Paul B.; Calcott, Brett; Mayer, Christoph; Lanfear, Robert

Description

BACKGROUND: Model selection is a vital part of most phylogenetic analyses, and accounting for the heterogeneity in evolutionary patterns across sites is particularly important. Mixture models and partitioning are commonly used to account for this variation, and partitioning is the most popular approach. Most current partitioning methods require some a priori partitioning scheme to be defined, typically guided by known structural features of the sequences, such as gene boundaries or codon...[Show more]

dc.contributor.authorFrandsen, Paul B.
dc.contributor.authorCalcott, Brett
dc.contributor.authorMayer, Christoph
dc.contributor.authorLanfear, Robert
dc.date.accessioned2015-04-16T02:04:34Z
dc.date.available2015-04-16T02:04:34Z
dc.identifier.issn1471-2148
dc.identifier.urihttp://hdl.handle.net/1885/13258
dc.description.abstractBACKGROUND: Model selection is a vital part of most phylogenetic analyses, and accounting for the heterogeneity in evolutionary patterns across sites is particularly important. Mixture models and partitioning are commonly used to account for this variation, and partitioning is the most popular approach. Most current partitioning methods require some a priori partitioning scheme to be defined, typically guided by known structural features of the sequences, such as gene boundaries or codon positions. Recent evidence suggests that these a priori boundaries often fail to adequately account for variation in rates and patterns of evolution among sites. Furthermore, new phylogenomic datasets such as those assembled from ultra-conserved elements lack obvious structural features on which to define a priori partitioning schemes. The upshot is that, for many phylogenetic datasets, partitioned models of molecular evolution may be inadequate, thus limiting the accuracy of downstream phylogenetic analyses. RESULTS: We present a new algorithm that automatically selects a partitioning scheme via the iterative division of the alignment into subsets of similar sites based on their rates of evolution. We compare this method to existing approaches using a wide range of empirical datasets, and show that it consistently leads to large increases in the fit of partitioned models of molecular evolution when measured using AICc and BIC scores. In doing so, we demonstrate that some related approaches to solving this problem may have been associated with a small but important bias. CONCLUSIONS: Our method provides an alternative to traditional approaches to partitioning, such as dividing alignments by gene and codon position. Because our method is data-driven, it can be used to estimate partitioned models for all types of alignments, including those that are not amenable to traditional approaches to partitioning.
dc.description.sponsorshipPF is supported by (i) Google and the National Evolutionary Synthesis Center (NESCent) through the Google Summer of Code/NESCent Phyloinformatics Summer of Code, (ii) the Department of Entomology at Rutgers University through the Thomas J. Headlee fellowship, (iii) the DAAD (Germany academic exchange service) and (iv) NSF DEB 0816865. RL is supported by the Australian Research Council, and a short-term visitor grant to the National Evolutionary Synthesis Center (NESCent) provided by the NSF.
dc.publisherBioMed Central
dc.rights© 2015 Frandsen et al.; licensee BioMed Central. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
dc.sourceBMC Evolutionary Biology
dc.subjectModel selection
dc.subjectPartitioning
dc.subjectPartitionfinder
dc.subjectPhylogenetics
dc.subjectPhylogenomics
dc.subjectK-means
dc.subjectClustering
dc.subjectUltra-conserved elements
dc.subjectUCE’s
dc.titleAutomatic selection of partitioning schemes for phylogenetic analyses using iterative k-means clustering of site rates
dc.typeJournal article
local.identifier.citationvolume15
dc.date.issued2015-02-10
local.identifier.absfor060299 - Ecology not elsewhere classified
local.identifier.absfor060300 - EVOLUTIONARY BIOLOGY
local.identifier.ariespublicationa383154xPUB1207
local.publisher.urlhttp://www.biomedcentral.com/
local.type.statusPublished Version
local.contributor.affiliationLanfear, R., Ecology Evolution and Genetics, Research School of Biology, The Australian National University
local.bibliographicCitation.issue13
local.bibliographicCitation.startpage1
local.bibliographicCitation.lastpage17
local.identifier.doi10.1186/s12862-015-0283-7
local.identifier.absseo970106 - Expanding Knowledge in the Biological Sciences
dc.date.updated2015-12-10T10:17:39Z
local.identifier.scopusID2-s2.0-84924110372
CollectionsANU Research Publications

Download

File Description SizeFormat Image
Frandsen et al Automatic Selection of Partitioning Schemes 2015.pdf2.17 MBAdobe PDFThumbnail


Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  17 November 2022/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator