Skip navigation
Skip navigation

DeepSNVMiner: a sequence analysis tool to detect emergent, rare mutations in subsets of cell populations

Andrews, T Daniel; Jeelall, Yogesh; Talaulikar, Dipti; Goodnow, Christopher C; Field, Matthew A

Description

Background. Massively parallel sequencing technology is being used to sequence highly diverse populations of DNA such as that derived from heterogeneous cell mixtures containing both wild-type and disease-related states. At the core of such molecule tagging techniques is the tagging and identification of sequence reads derived from individual input DNA molecules, which must be first computationally disambiguated to generate read groups sharing common sequence tags, with each read group...[Show more]

dc.contributor.authorAndrews, T Daniel
dc.contributor.authorJeelall, Yogesh
dc.contributor.authorTalaulikar, Dipti
dc.contributor.authorGoodnow, Christopher C
dc.contributor.authorField, Matthew A
dc.date.accessioned2016-09-14T03:59:05Z
dc.date.available2016-09-14T03:59:05Z
dc.identifier.issn2167-8359
dc.identifier.urihttp://hdl.handle.net/1885/108871
dc.description.abstractBackground. Massively parallel sequencing technology is being used to sequence highly diverse populations of DNA such as that derived from heterogeneous cell mixtures containing both wild-type and disease-related states. At the core of such molecule tagging techniques is the tagging and identification of sequence reads derived from individual input DNA molecules, which must be first computationally disambiguated to generate read groups sharing common sequence tags, with each read group representing a single input DNA molecule. This disambiguation typically generates huge numbers of reads groups, each of which requires additional variant detection analysis steps to be run specific to each read group, thus representing a significant computational challenge. While sequencing technologies for producing these data are approaching maturity, the lack of available computational tools for analysing such heterogeneous sequence data represents an obstacle to the widespread adoption of this technology. Results. Using synthetic data we successfully detect unique variants at dilution levels of 1 in a 1,000,000 molecules, and find DeeepSNVMiner obtains significantly lower false positive and false negative rates compared to popular variant callers GATK, SAMTools, FreeBayes and LoFreq, particularly as the variant concentration levels decrease. In a dilution series with genomic DNA from two cells lines, we find DeepSNVMiner identifies a known somatic variant when present at concentrations of only 1 in 1,000 molecules in the input material, the lowest concentration amongst all variant callers tested. Conclusions. Here we present DeepSNVMiner; a tool to disambiguate tagged sequence groups and robustly identify sequence variants specific to subsets of starting DNA molecules that may indicate the presence of a disease. DeepSNVMiner is an automated workflow of custom sequence analysis utilities and open source tools able to differentiate somatic DNA variants from artefactual sequence variants that likely arose during DNA amplification. The workflow remains flexible such that it may be customised to variants of the data production protocol used, and supports reproducible analysis through detailed logging and reporting of results. DeepSNVMiner is available for academic non-commercial research purposes at https://github.com/mattmattmattmatt/DeepSNVMiner.
dc.description.sponsorshipNational Institutes of Health Grant U19 AI100627, NHMRC Australian Fellowship 585490, and Bioplatoforms Australia supported this work. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
dc.publisherPeerJ
dc.rightsCopyright 2016 Andrews et al. Distributed under Creative Commons CC-BY 4.0
dc.sourcePeerJ
dc.subjectdeep sequencing
dc.subjectngs
dc.subjectrare mutations
dc.subjectvariant detection
dc.titleDeepSNVMiner: a sequence analysis tool to detect emergent, rare mutations in subsets of cell populations
dc.typeJournal article
local.identifier.citationvolume4
dc.date.issued2016
local.publisher.urlhttps://peerj.com/
local.type.statusPublished Version
local.contributor.affiliationAndrews, T. D., Department of Immunology, John Curtin School of Medical Research, The Australian National University
local.contributor.affiliationJeelall, Y., Department of Immunology, John Curtin School of Medical Research, The Australian National University
local.contributor.affiliationTalaulikar, D., Department of Immunology, John Curtin School of Medical Research, The Australian National University
local.contributor.affiliationGoodnow, C. C., Department of Immunology, John Curtin School of Medical Research, The Australian National University
local.contributor.affiliationField, M. A., Department of Immunology, John Curtin School of Medical Research, The Australian National University
dc.relationhttp://purl.org/au-research/grants/nhmrc/585490
local.identifier.essn2167-8359
local.bibliographicCitation.startpagee2074
local.identifier.doi10.7717/peerj.2074
dcterms.accessRightsOpen Access
CollectionsANU Research Publications

Download

File Description SizeFormat Image
01_Andrews_DeepSNVMiner_2016.pdf1.66 MBAdobe PDFThumbnail


Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  22 January 2019/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator