Open Research will be updating the system on Monday, 25 May 2026, from 8:15 to 8:45 AM. We apologise for any inconvenience caused.

smartsnp, an r package for fast multivariate analyses of big genomic data

dc.contributor.authorHerrando-Perez, Salvador
dc.contributor.authorTobler, Raymond
dc.contributor.authorHuber, Christian D
dc.date.accessioned2024-01-02T01:59:09Z
dc.date.available2024-01-02T01:59:09Z
dc.date.issued2021
dc.date.updated2022-09-18T08:16:47Z
dc.description.abstractPrincipal component analysis (PCA) is a powerful tool for the analysis of population structure, a genetic property that is essential to understand the evolutionary processes driving biological diversification and (pre)historical colonizations, migrations and extinctions. In the current era of high-throughput sequencing technologies, population structure can be quantified from scores of genetic markers across hundreds to thousands of genomes. However, these big genomic datasets pose substantial computing and analytical challenges. We present the r package smartsnp for fast and user-friendly computation of PCA on single-nucleotide polymorphism (SNP) data. Inspired by the current field-standard software EIGENSOFT, smartsnp includes appropriate SNP scaling for genetic drift and allows projection of ancient samples onto a modern genetic space while also providing permutation-based multivariate tests for population differences in genetic diversity (both location and dispersion). Our extensive benchmarks show that smartsnp's PCA is 2-4 times faster than EIGENSOFT's SMARTPCA algorithm across a wide range of sample and SNP sizes. All four smartsnp functions (smart_pca, smart_permanova, smart_permdisp and smart_mva) process datasets with up to 100 samples and 1 million simulated SNPs in less than 30 s and accurately recreate previously published SMARTPCA of ancient-human and wolf genotypes. The package smartsnp provides fast and robust multivariate ordination and hypothesis testing for big genomic data that is also suitable for ancient and low-coverage modern DNA. The simple implementation should appeal to biological conservation, evolutionary, ecological and (palaeo)genomic researchers, and be useful for phenotype, ancestry and lineage studies.en_AU
dc.description.sponsorshipEuropean Union's LIFE18 NAT/ES/000121 LIFE DIVAQUA;en_AU
dc.format.mimetypeapplication/pdfen_AU
dc.identifier.issn2041-210Xen_AU
dc.identifier.urihttp://hdl.handle.net/1885/311117
dc.language.isoen_AUen_AU
dc.provenancehttps://v2.sherpa.ac.uk/id/publication/16031/..."published version can be archived in institutional repository" from SHERPA/RoMEO site as at 02/01/2024en_AU
dc.publisherWiley-Blackwellen_AU
dc.relationhttp://purl.org/au-research/grants/arc/CE170100015en_AU
dc.relationhttp://purl.org/au-research/grants/arc/DE180100883en_AU
dc.relationhttp://purl.org/au-research/grants/arc/DE190101069en_AU
dc.rights© 2021 The authorsen_AU
dc.rights.urihttp://creativecommons.org/licenses/ by-nc-nd/4.0/en_AU
dc.sourceMethods in Ecology and Evolutionen_AU
dc.subjectancient DNAen_AU
dc.subjectgenetic driften_AU
dc.subjectpopulation structureen_AU
dc.subjectsingle nucleotide polymorphismen_AU
dc.titlesmartsnp, an r package for fast multivariate analyses of big genomic dataen_AU
dc.typeJournal articleen_AU
dcterms.accessRightsOpen Accessen_AU
local.bibliographicCitation.issue11en_AU
local.bibliographicCitation.lastpage2093en_AU
local.bibliographicCitation.startpage2084en_AU
local.contributor.affiliationHerrando-Perez, Salvador, The University of Adelaideen_AU
local.contributor.affiliationTobler, Raymond, College of Asia and the Pacific, ANUen_AU
local.contributor.affiliationHuber, Christian D, University of Adelaideen_AU
local.contributor.authoruidTobler, Raymond, u1114444en_AU
local.description.notesImported from ARIESen_AU
local.identifier.absfor310203 - Computational ecology and phylogeneticsen_AU
local.identifier.absfor310509 - Genomicsen_AU
local.identifier.absfor310510 - Molecular evolutionen_AU
local.identifier.ariespublicationa383154xPUB21035en_AU
local.identifier.citationvolume12en_AU
local.identifier.doi10.1111/2041-210X.13684en_AU
local.identifier.thomsonIDWOS:000682871000001
local.publisher.urlhttps://besjournals.onlinelibrary.wiley.com/en_AU
local.type.statusPublished Versionen_AU

Downloads

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Methods Ecol Evol - 2021 - Herrando‐Pérez - smartsnp an r package for fast multivariate analyses of big genomic data.pdf
Size:
2.04 MB
Format:
Adobe Portable Document Format
Description: