High Activity Target-Site Identification Using Phenotypic Independent CRISPR-Cas9 Core Functionality

Date

2018

Authors

Wilson, Laurence
Reti, Daniel
O'Brien, Aidan
Dunne, Robert A.
Bauer, Denis C

Journal Title

Journal ISSN

Volume Title

Publisher

Mary Ann Liebert, Inc.

Abstract

The activity of CRISPR-Cas9 target sites can be measured experimentally through phenotypic assays or mutation rate and used to build computational models to predict activity of novel target sites. However, currently published models have been reported to perform poorly in situations other than their training conditions. In this study, we hence investigate how different sources of data influence predictive power and identify the best data set for the most robust predictive model. We use the activity of 28,606 target sites and a machine learning approach to train a predictive model of CRISPR-Cas9 activity, outperforming other published methods by an average increase in accuracy of 80% for prediction of the degree of activity and 13% for classification into active and inactive categories. We find that using data sets that measure CRISPR-Cas9 activity through sequencing provides more accurate predictions of activity. Our model, dubbed TUSCAN, is highly scalable, predicting the activity of 5000 target sites in under 7 s, making it suitable for genome-wide screens. We conclude that sophisticated machine learning methods can classify binary CRISPR-Cas9 activity; however, predicting fine-scale activity scores will require larger data sets directly measuring Indel insertion rate.

Description

Keywords

Citation

Source

The CRISPR Journal

Type

Journal article

Book Title

Entity type

Access Statement

License Rights

DOI

10.1089/crispr.2017.0021

Restricted until

2099-12-31