Stability of SARS-CoV-2 phylogenies

dc.contributor.authorTurakhia, Yatish
dc.contributor.authorde Maio, Nicola
dc.contributor.authorThornlow, Bryan
dc.contributor.authorGozashti, Landen
dc.contributor.authorLanfear, Robert
dc.contributor.authorWalker, Conor R.
dc.contributor.authorHinrichs, Angie S.
dc.contributor.authorFernandes, Jason D.
dc.contributor.authorBorges, Rui
dc.contributor.authorSlodkowicz, Greg
dc.contributor.authorWeilguny, Lukas
dc.contributor.authorHaussler, David
dc.contributor.authorGoldman, Nick
dc.contributor.authorCorbett-Detig, Russell
dc.date.accessioned2022-10-11T04:56:19Z
dc.date.available2022-10-11T04:56:19Z
dc.date.issued2020
dc.date.updated2021-11-28T07:22:30Z
dc.description.abstractThe SARS-CoV-2 pandemic has led to unprecedented, nearly real-time genetic tracing due to the rapid community sequencing response. Researchers immediately leveraged these data to infer the evolutionary relationships among viral samples and to study key biological questions, including whether host viral genome editing and recombination are features of SARS-CoV-2 evolution. This global sequencing effort is inherently decentralized and must rely on data collected by many labs using a wide variety of molecular and bioinformatic techniques. There is thus a strong possibility that systematic errors associated with lab-or protocol-specific practices affect some sequences in the repositories. We find that some recurrent mutations in reported SARS-CoV-2 genome sequences have been observed predominantly or exclusively by single labs, co-localize with commonly used primer binding sites and are more likely to affect the protein-coding sequences than other similarly recurrent mutations. We show that their inclusion can affect phylogenetic inference on scales relevant to local lineage tracing, and make it appear as though there has been an excess of recurrent mutation or recombination among viral lineages. We suggest how samples can be screened and problematic variants removed, and we plan to regularly inform the scientific community with our updated results as more SARS-CoV-2 genome sequences are shared (https://virological.org/t/issues-with-sars-cov-2sequencing-data/473 and https://virological.org/t/masking-strategies-for-sars-cov-2-alignments/ 480). We also develop tools for comparing and visualizing differences among very large phylogenies and we show that consistent clade- and tree-based comparisons can be made between phylogenies produced by different groups. These will facilitate evolutionary inferences and comparisons among phylogenies produced for a wide array of purposes. Building on the SARS-CoV-2 Genome Browser at UCSC, we present a toolkit to compare, analyze and combine SARS-CoV-2 phylogenies, find and remove potential sequencing errors and establish a widely shared, stable clade structure for a more accurate scientific inference and discourse.en_AU
dc.description.sponsorshipThe UCSC Human Genome Browser software, quality control, and training is funded by NHGRI, currently with grant 5U41HG002371-19. The SARS-CoV-2 genome browser and data annotation tracks are funded by generous individual donors including Pat & Rowland Rebele and a University of California Office of the President Emergency COVID-19 Research Seed Funding Grant R00RG2456. R.C.-D. and B.T. were funded in part by R35GM128932 and by an Alfred P. Sloan Foundation fellowship to R.C.-D. N.D.M., L.W. and N.G. are funded by the European Molecular Biology Laboratory (EMBL); C.R.W. is funded by the National Institute of Health Research (NIHR) Cambridge Biomedical Research Centre and EMBL. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.en_AU
dc.format.mimetypeapplication/pdfen_AU
dc.identifier.issn1553-7390en_AU
dc.identifier.urihttp://hdl.handle.net/1885/274451
dc.language.isoen_AUen_AU
dc.provenanceThis is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.en_AU
dc.publisherPublic Library of Scienceen_AU
dc.rights© 2020 The authorsen_AU
dc.rights.licenseCreative Commons Attribution licenceen_AU
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en_AU
dc.sourcePLoS Geneticsen_AU
dc.titleStability of SARS-CoV-2 phylogeniesen_AU
dc.typeJournal articleen_AU
dcterms.accessRightsOpen Accessen_AU
local.bibliographicCitation.issue11en_AU
local.bibliographicCitation.startpagee1009175en_AU
local.contributor.affiliationTurakhia, Yatish, University of California Santa Cruzen_AU
local.contributor.affiliationde Maio, Nicola, European Bioinformatics Instituteen_AU
local.contributor.affiliationThornlow, Bryan, University of California Santa Cruzen_AU
local.contributor.affiliationGozashti, Landen, University of California Santa Cruzen_AU
local.contributor.affiliationLanfear, Robert, College of Science, ANUen_AU
local.contributor.affiliationWalker, Conor R., European Bioinformatics Instituteen_AU
local.contributor.affiliationHinrichs, Angie S., University of California Santa Cruzen_AU
local.contributor.affiliationFernandes, Jason D., University of California Santa Cruzen_AU
local.contributor.affiliationBorges, Rui, Institut fur Populationsgenetik Vetmeduni Viennaen_AU
local.contributor.affiliationSlodkowicz, Greg, MRC Laboratory of Molecular Biologyen_AU
local.contributor.affiliationWeilguny, Lukas, European Bioinformatics Instituteen_AU
local.contributor.affiliationHaussler, David, University of Californiaen_AU
local.contributor.affiliationGoldman, Nick, European Bioinformatics Instituteen_AU
local.contributor.affiliationCorbett-Detig, Russell, University of California Santa Cruzen_AU
local.contributor.authoruidLanfear, Robert, u4595144en_AU
local.description.notesImported from ARIESen_AU
local.identifier.absfor420200 - Epidemiologyen_AU
local.identifier.ariespublicationa383154xPUB16050en_AU
local.identifier.citationvolume16en_AU
local.identifier.doi10.1371/journal.pgen.1009175en_AU
local.identifier.scopusID2-s2.0-85097185950
local.publisher.urlhttps://journals.plos.org/en_AU
local.type.statusPublished Versionen_AU

Downloads

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Stability.pdf
Size:
3.88 MB
Format:
Adobe Portable Document Format
Description: