Evaluating, Accelerating and Extending the Multispecies Coalescent Model of Evolution
Date
2017
Authors
Ogilvie, Huw Alexander
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
So much research builds on evolutionary histories of species and
genes. They are used in genomics to infer synteny, in ecology to
describe and predict biodiversity, and in molecular biology to
transfer knowledge acquired in model organisms to humans and
crops. Beyond downstream applications, expanding our knowledge of
life on Earth is important in its own right. From Naturalis
Historia to On the Origin of Species, the acquisition of this
knowledge has been a part of human development.
Evolutionary histories are commonly represented as trees, where a
common ancestor progressively splits into descendant species or
alleles. Time trees add more information by using height to
represent genetic distance or elapsed time. Species and gene
trees can be inferred from molecular sequences using methods
which are explicitly model-based, or implicitly assume or are
statistically consistent with a particular model of evolution.
One such model, the multispecies coalescent (MSC), is the topic
of my thesis. Under this model, separate trees are inferred for
the species history and for each gene’s history. Gene trees are
embedded within the species tree according to a coalescent
process.
Researchers often avoid the MSC when reconstructing time trees
because of claims that available implementations are too
computationally demanding. Instead, the species history is
inferred using a single tree by concatenating the sequences from
each gene. I began my thesis research by evaluating the effect of
this approximation. In a realistic simulation based on parameters
inferred from empirical data, concatenation was grossly
inaccurate, especially when estimating recent species divergence
times. In a later simulation study I demonstrated that when using
concatenation, credible intervals often excluded the true
values.
To address reluctance towards using the MSC, I developed a faster
implementation of the model. StarBEAST2 is a Markov chain Monte
Carlo (MCMC) method, meaning it characterizes the probability
distribution over trees by randomly walking the parameter space.
I improved computational performance by developing more efficient
proposals used to traverse the space, and reducing the number of
parameters in the model through analytical integration of
population sizes.
Despite its sophistication, the MSC has theoretical limitations.
One is that the substitution rate is assumed to stay constant, or
uncorrelated between lineages of different genes. However
substitution rates do vary and are associated with species traits
like body size. I addressed this assumption in StarBEAST2 by
extending the MSC to estimate substitution rates for each
species. Another assumption is that genetic material cannot be
transferred horizontally, but a more general model called the
multispecies network coalescent (MSNC) permits introgression of
alleles across species boundaries. My collaborators and I have
developed and evaluated an MCMC implementation of the the MSNC.
My final thesis project was to combine the MSC with the
fossilized birth-death (FBD) process, which models how species
are fossilized and sampled through time. To demonstrate the
utility of the FBD-MSC model, I used it to reconstruct the
evolutionary history of Caninae (dogs and foxes) using fossil
data and molecular sequences.
Description
Keywords
caninae, species trees, phylogenetics, evolution, biology, Markov chain Monte Carlo, Bayesian models
Citation
Collections
Source
Type
Thesis (PhD)
Book Title
Entity type
Access Statement
License Rights
Restricted until
Downloads
File
Description