Investigation into the genome evolution of the genus Eucalyptus, and assessing and improving real accuracies of Oxford Nanopore long-read sequencing
Abstract
Genomes have a highly organised architecture (non-random organisation of functional and non-functional genetic elements within chromosomes) that is essential for many biological functions. However, despite the need to conserve genome architecture, a high level of structural variation has been observed within species. With increasing divergence time, genome architecture also increasingly diverges. Little is known about the mechanisms leading to this high level of structural variability, why it is tolerated, or its implications for adaptation and speciation. By studying genome evolution within Eucalyptus insights into the mechanisms driving genetic diversity, adaptation, and speciation can be found. Eucalyptus, naturally evolving with low prezygotic reproductive barriers, diverse and widespread, is a suitable study group for genome evolutionary studies. Species chosen in this thesis were selected to be widely dispersed across the established phylogeny, with an additional specious section highly represented to also provide a number of closely related genomes. During my PhD, I investigated genome architecture conservation and divergence through comparative genome analysis of 33 diverse species, spanning 1-50 million years of divergent genome evolution. Genome architectural conservation and divergence was measured, describing the pattern of genome evolution among these species. It was hypothesised and shown that immediately following lineage divergence, genome architecture is highly fragmented by rearrangements. As lineage divergence continues, the accumulation of small mutations within rearrangements becomes the primary driver of genome divergence. The loss of syntenic regions also contributes to genome divergence but at a slower pace than rearrangements. This mode of genome evolution is consistent with established theory of gene duplication and subsequent pseudogenisation/function acquisition. Further examination suggests that rearrangements may be altering the phenotypes of Eucalyptus species. Unequal recombination in transposon-rich regions is likely the key rearrangement mechanism, as evidenced by their frequent occurrence in these regions of closely related Eucalyptus genomes. This work also suggests that the use of single reference genomes could lead to reference bias, especially when examining loci that have significantly diverged, been deleted, duplicated, or rearranged. Finally, this work provides an unbiased framework for investigating potential speciation and adaptive loci among a rapidly radiating foundation species of woodland trees.
To better enable genomic studies affordable and accurate long-read sequencing is required. However, despite the advancements made by ONT (Oxford Nanopore Technologies) and PacBio, the two major long-read sequencing providers, acquiring long-reads remains prohibitive for many laboratories. PacBio is currently cost-prohibitive for large-scale use, and ONT suffers from lower accuracy, necessitating higher coverage, increasing costs. Here by focusing on improving ONT sequence accuracy by addressing the crucial step of basecalling, the conversion of raw electrical signals to nucleotide sequences. Basecalling is hindered by models trained on mixed-species DNA, potentially reducing accuracy due to conflicting basepair chemistry. Two plant species, Phebalium stellatum and Xanthorrhoea johnsonii, were sequenced to train species-specific basecaller models, aiming to improve per-base accuracy. Sequencing accuracy achieved by ONT basecallers was assessed, evaluating gains from species-specific models, improved flowcells (R10.4, FLO-PRO112), and sequencing kits (SQK-LSK112). Oxford Nanopore Technologies Guppy 6 super-accurate basecalling yields read accuracies of 91.96% and 94.15%. Species-specific basecalling models improve accuracy to 93.24% and 95.16%. R10.4 sequencing kits further enhance accuracy to 95.46% (super-accurate) and 96.87% (species-specific).
Description
Keywords
Citation
Collections
Source
Type
Book Title
Entity type
Access Statement
License Rights
Restricted until
Downloads
File
Description
Supporting Material
Supporting Material
Supporting Material
Supporting Material
Supporting Material