Cultural advice

The Australian National University acknowledges, celebrates and pays our respects to the Ngunnawal and Ngambri people of the Canberra region and to all First Nations Australians on whose traditional lands we meet and work, and whose cultures are among the oldest continuing cultures in human history.

Aboriginal and Torres Strait Islander peoples are advised that ANU Library collections may include images, names, voices, and other representations of deceased persons.

Material in the collection may contain terms, language or views that reflect the period in which the item was created and may be considered inappropriate today.

Investigation into the genome evolution of the genus Eucalyptus, and assessing and improving real accuracies of Oxford Nanopore long-read sequencing

Loading...
Thumbnail Image

Date

Authors

Ferguson, Scott

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Genomes have a highly organised architecture (non-random organisation of functional and non-functional genetic elements within chromosomes) that is essential for many biological functions. However, despite the need to conserve genome architecture, a high level of structural variation has been observed within species. With increasing divergence time, genome architecture also increasingly diverges. Little is known about the mechanisms leading to this high level of structural variability, why it is tolerated, or its implications for adaptation and speciation. By studying genome evolution within Eucalyptus insights into the mechanisms driving genetic diversity, adaptation, and speciation can be found. Eucalyptus, naturally evolving with low prezygotic reproductive barriers, diverse and widespread, is a suitable study group for genome evolutionary studies. Species chosen in this thesis were selected to be widely dispersed across the established phylogeny, with an additional specious section highly represented to also provide a number of closely related genomes. During my PhD, I investigated genome architecture conservation and divergence through comparative genome analysis of 33 diverse species, spanning 1-50 million years of divergent genome evolution. Genome architectural conservation and divergence was measured, describing the pattern of genome evolution among these species. It was hypothesised and shown that immediately following lineage divergence, genome architecture is highly fragmented by rearrangements. As lineage divergence continues, the accumulation of small mutations within rearrangements becomes the primary driver of genome divergence. The loss of syntenic regions also contributes to genome divergence but at a slower pace than rearrangements. This mode of genome evolution is consistent with established theory of gene duplication and subsequent pseudogenisation/function acquisition. Further examination suggests that rearrangements may be altering the phenotypes of Eucalyptus species. Unequal recombination in transposon-rich regions is likely the key rearrangement mechanism, as evidenced by their frequent occurrence in these regions of closely related Eucalyptus genomes. This work also suggests that the use of single reference genomes could lead to reference bias, especially when examining loci that have significantly diverged, been deleted, duplicated, or rearranged. Finally, this work provides an unbiased framework for investigating potential speciation and adaptive loci among a rapidly radiating foundation species of woodland trees. To better enable genomic studies affordable and accurate long-read sequencing is required. However, despite the advancements made by ONT (Oxford Nanopore Technologies) and PacBio, the two major long-read sequencing providers, acquiring long-reads remains prohibitive for many laboratories. PacBio is currently cost-prohibitive for large-scale use, and ONT suffers from lower accuracy, necessitating higher coverage, increasing costs. Here by focusing on improving ONT sequence accuracy by addressing the crucial step of basecalling, the conversion of raw electrical signals to nucleotide sequences. Basecalling is hindered by models trained on mixed-species DNA, potentially reducing accuracy due to conflicting basepair chemistry. Two plant species, Phebalium stellatum and Xanthorrhoea johnsonii, were sequenced to train species-specific basecaller models, aiming to improve per-base accuracy. Sequencing accuracy achieved by ONT basecallers was assessed, evaluating gains from species-specific models, improved flowcells (R10.4, FLO-PRO112), and sequencing kits (SQK-LSK112). Oxford Nanopore Technologies Guppy 6 super-accurate basecalling yields read accuracies of 91.96% and 94.15%. Species-specific basecalling models improve accuracy to 93.24% and 95.16%. R10.4 sequencing kits further enhance accuracy to 95.46% (super-accurate) and 96.87% (species-specific).

Description

Keywords

Citation

Source

Book Title

Entity type

Access Statement

License Rights

Restricted until

Downloads

abcd