Kmer2SNP: Reference-free SNP calling from raw reads based on matching

Date

2021

Authors

Li, Yanbo
Patel, Hardip
Lin, Yu

Journal Title

Journal ISSN

Volume Title

Publisher

IEEE

Abstract

SNP calling is a fundamental problem of genetic analysis and has many applications, such as gene-disease diagnosis, drug design, and ancestry inference. Prior approaches either require high-quality reference genome, or suffer from low recall/precision or high runtime. We develop a reference-free algorithm Kmer2SNP to call SNP directly from raw reads, an approach that models SNP calling into a maximum weight matching problem. We benchmark Kmer2SNP against reference-free methods including hybrid (assembly-based) and assembly-free methods on both simulated and real datasets. Experimental results show that Kmer2SNP achieves better SNP calling quality while being an order of magnitude faster than the state-of-the-art methods. Kmer2SNP shows the potential of calling SNPs only using k-mers from raw reads without assembly. The source code is freely available at https://github.com/yanboANU/Kmer2SNP.

Description

Keywords

SNP calling, Reference-free, K-mer analysis, Maximum-weight matching

Citation

Source

Type

Conference paper

Book Title

Entity type

Access Statement

License Rights

Restricted until

2099-12-31