Kmer2SNP: Reference-free SNP calling from raw reads based on matching
Date
2021
Authors
Li, Yanbo
Patel, Hardip
Lin, Yu
Journal Title
Journal ISSN
Volume Title
Publisher
IEEE
Abstract
SNP calling is a fundamental problem of genetic analysis and has many applications, such as gene-disease diagnosis, drug design, and ancestry inference. Prior approaches either require high-quality reference genome, or suffer from low recall/precision or high runtime. We develop a reference-free algorithm Kmer2SNP to call SNP directly from raw reads, an approach that models SNP calling into a maximum weight matching problem. We benchmark Kmer2SNP against reference-free methods including hybrid (assembly-based) and assembly-free methods on both simulated and real datasets. Experimental results show that Kmer2SNP achieves better SNP calling quality while being an order of magnitude faster than the state-of-the-art methods. Kmer2SNP shows the potential of calling SNPs only using k-mers from raw reads without assembly. The source code is freely available at https://github.com/yanboANU/Kmer2SNP.
Description
Keywords
SNP calling, Reference-free, K-mer analysis, Maximum-weight matching
Citation
Collections
Source
Type
Conference paper
Book Title
Entity type
Access Statement
License Rights
Restricted until
2099-12-31