Simon, Helmut
Description
Prior to the advent of DNA sequencing technology, the progress of population genetics was constrained by the unavailability of molecular-level data to test competing hypotheses and models. Genomic data obtained from DNA sequencing has revealed that evolutionary processes in real populations diverge substantially from the simplifying assumptions of earlier models. In this thesis, I have applied more recent advances in data availability, statistical methods and computational technology to explore...[Show more] the consequences of relaxing the assumptions of earlier models to better reflect what has been learned about populations and the genome. I have focused on two main areas arising from the study of mutation: characterising intragenomic mutational heterogeneity and making inferences about the history of a population from variant data.
In the first instance, I address two primary causes of mutation rate heterogeneity: sequence context and recombination. Mutation rates have also been shown to vary, not only between different bases and mutation directions, but also with genomic location at scales ranging from individual nucleotides to multi-megabase sized regions. The patterns in the heterogeneity of mutation rates at varying scales provide can be used to increase understanding of factors influencing mutation and of the relative magnitude of their effects. I also examined the variance in the probability of polymorphism when conditioned on contexts of various sizes and found that when the 12 point mutation directions are considered separately, variance due to context increases significantly as we move from 3-mer to 5-mer and from 5-mer to 7-mer contexts. However, when all mutations are considered in aggregate, these differences are outweighed by the effect of interaction between the central base and its immediate neighbours. I then calculated the variance due to recombination and the probability that a recombination event causes a mutation, employing statistical procedures used in the analysis of time series to take account of the spatial auto-correlation of recombination and mutation rates along the genome. My results support the view that genomic diversity in recombination hotspots arises largely from a direct effect of recombination on mutation rather than predominantly from the effect of selective sweeps.
I also investigated how polymorphism data can be used to make inferences about historical evolutionary influences without the assumptions on the demographic history of a population that have previously been required. I develop a method of testing for selection that can use null models incorporating such a demographic history and that benefits from the power of using the full likelihood of the null model. I compare this method to the well-known statistic Tajima's D and also use it to investigate some regions of the human genome that are candidates for the operation of natural selection. Methods for inferring the genealogical history of a sample from a population have also relied on the assumption that the population has maintained a constant size or some related constraint. In Chapter 4, I present a method of making such inferences without relying on assumptions of this type. This method makes use of Bayesian MCMC techniques with a novel approach to the selection of prior distributions.
Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.