vignettes/CRISPRdemo.Rmd
CRISPRdemo.Rmd
Zhu LJ, Holmes BR, Aronin N and Brodsky MH (2014). “CRISPRseek: A Bioconductor Package to Identify Target-Specific Guide RNAs for CRISPR-Cas9 Genome-Editing Systems.” PLoS one, 9(9). http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4172692/.
Zhu LJ (2015). “Overview of guide RNA design tools for CRISPR-Cas9 genome editing technology.” Front. Biol., 10(4).
We are going to use a sequence from human as input, which has been included as as fasta file in the CRISPRseek package.
To perform off-target analysis, we need to load Human BSgenome package.
To annotate the target and off-targets, we need to load Human Transcript and gene identifier mapping packages.
In addition, you need to specify the output directory which will be the directory to look for all the output files.
For the current release, you no longer need to specify the file containing all restriction enzyme (RE) cut patterns. You have the option to specify your own RE pattern file instead of the default one supplied by the CRISPR package.
library(CRISPRseek)
## Loading required package: BiocGenerics
## Loading required package: parallel
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
##
## clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
## clusterExport, clusterMap, parApply, parCapply, parLapply,
## parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, append, as.data.frame, basename, cbind, colnames,
## dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
## grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
## order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
## rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
## union, unique, unsplit, which.max, which.min
## Loading required package: Biostrings
## Loading required package: S4Vectors
## Loading required package: stats4
##
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:base':
##
## expand.grid
## Loading required package: IRanges
## Loading required package: XVector
##
## Attaching package: 'Biostrings'
## The following object is masked from 'package:base':
##
## strsplit
library(BSgenome.Hsapiens.UCSC.hg19)
## Loading required package: BSgenome
## Loading required package: GenomeInfoDb
## Loading required package: GenomicRanges
## Loading required package: rtracklayer
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
## Loading required package: GenomicFeatures
## Loading required package: AnnotationDbi
## Loading required package: Biobase
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
library(org.Hs.eg.db)
##
outputDir <- file.path(getwd(),"CRISPRseekDemo") inputFilePath <- system.file('extdata', 'inputseq.fa', package = 'CRISPRseek')
args(offTargetAnalysis) args(compare2Sequences) ?offTargetAnalysis ?compare2Sequences ?CRISPRseek browseVignettes('CRISPRseek')
Please note that chromToSearch is set to chrX here for speed purpose, usually you do not need to set it, by default it is set to all.
offTargetAnalysis(inputFilePath, BSgenomeName = Hsapiens, chromToSearch ="chrX", txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, orgAnn = org.Hs.egSYMBOL, outputDir = outputDir, overwrite = TRUE)
## Validating input ...
## Searching for gRNAs ...
## >>> Finding all hits in sequence chrX ...
## >>> DONE searching
## Building feature vectors for scoring ...
## Calculating scores ...
## Annotating, filtering and generating reports ...
## 403 genes were dropped because they have exons located on both strands
## of the same reference sequence or on more than one reference sequence,
## so cannot be represented by a single genomic range.
## Use 'single.strand.genes.only=FALSE' to get all the genes in a
## GRangesList object, or use suppressMessages() to suppress this message.
## Done annotating
## Add paired information...
## Add RE information...
## write gRNAs to bed file...
## Scan for REsites in flanking region...
## Done. Please check output files in directory /__w/CRISPRseekGUIDEseqBioc2020Workshop/CRISPRseekGUIDEseqBioc2020Workshop/vignettes/CRISPRseekDemo/
## $on.target
## names forViewInUCSC
## [1,] "Hsap_GATA1_ex2_gR17f" "chrX:48649564-48649586"
## [2,] "Hsap_GATA1_ex2_gR20r" "chrX:48649577-48649599"
## [3,] "Hsap_GATA1_ex2_gR39f" "chrX:48649586-48649608"
## [4,] "Hsap_GATA1_ex2_gR40f" "chrX:48649587-48649609"
## [5,] "Hsap_GATA1_ex2_gR7r" "chrX:48649564-48649586"
## extendedSequence gRNAefficacy
## [1,] "TCCCCCAGTTTGTGGATCCTGCTCTGGTGT" "0.106547942216103"
## [2,] "TTCTGGTGTGGAGGACACCAGAGCAGGATC" "0.142091080678942"
## [3,] "TCTGGTGTCCTCCACACCAGAATCAGGGGT" "0.0696295234374132"
## [4,] "CTGGTGTCCTCCACACCAGAATCAGGGGTT" "0.261881755628254"
## [5,] "GACACCAGAGCAGGATCCACAAACTGGGGG" "0.101719829329578"
##
## $summary
## names forViewInUCSC extendedSequence
## 1 Hsap_GATA1_ex2_gR17f chrX:48649564-48649586 TCCCCCAGTTTGTGGATCCTGCTCTGGTGT
## 2 Hsap_GATA1_ex2_gR20r chrX:48649577-48649599 TTCTGGTGTGGAGGACACCAGAGCAGGATC
## 3 Hsap_GATA1_ex2_gR39f chrX:48649586-48649608 TCTGGTGTCCTCCACACCAGAATCAGGGGT
## 4 Hsap_GATA1_ex2_gR40f chrX:48649587-48649609 CTGGTGTCCTCCACACCAGAATCAGGGGTT
## 5 Hsap_GATA1_ex2_gR7r chrX:48649564-48649586 GACACCAGAGCAGGATCCACAAACTGGGGG
## gRNAefficacy gRNAsPlusPAM top5OfftargetTotalScore
## 1 0.106547942216103 CCAGTTTGTGGATCCTGCTCNGG 3.3
## 2 0.142091080678942 GGTGTGGAGGACACCAGAGCNGG 3.4
## 3 0.0696295234374132 GTGTCCTCCACACCAGAATCNGG 1.1
## 4 0.261881755628254 TGTCCTCCACACCAGAATCANGG 2.9
## 5 0.101719829329578 CCAGAGCAGGATCCACAAACNGG 0.2
## top10OfftargetTotalScore top1Hit.onTarget.MMdistance2PAM
## 1 3.3 NMM
## 2 3.4 NMM
## 3 1.1 NMM
## 4 2.9 NMM
## 5 0.2 NMM
## topOfftarget1MMdistance2PAM topOfftarget2MMdistance2PAM
## 1 20,17,13 20,19,3
## 2 18,16,15 17,14,1
## 3 16,15,1 15,11,3
## 4 13,2
## 5 16,8,3
## topOfftarget3MMdistance2PAM topOfftarget4MMdistance2PAM
## 1
## 2 20,14,8 13,10,7
## 3
## 4
## 5
## topOfftarget5MMdistance2PAM topOfftarget6MMdistance2PAM
## 1
## 2
## 3
## 4
## 5
## topOfftarget7MMdistance2PAM topOfftarget8MMdistance2PAM
## 1
## 2
## 3
## 4
## 5
## topOfftarget9MMdistance2PAM topOfftarget10MMdistance2PAM REname
## 1
## 2 Aco12261II
## 3 BslI HinfI TfiI
## 4 BslI HinfI TfiI
## 5 BslI PflMI
## uniqREin200 uniqREin100
## 1
## 2
## 3 HinfI TfiI HinfI TfiI
## 4 HinfI TfiI HinfI TfiI
## 5 PflMI PflMI
##
## $offtarget
## name gRNAPlusPAM OffTargetSequence inExon
## 1 Hsap_GATA1_ex2_gR17f CCAGTTTGTGGATCCTGCTCNGG ACAATTTCTGGATCCTGCTCCAG
## 2 Hsap_GATA1_ex2_gR17f CCAGTTTGTGGATCCTGCTCNGG CCAGTTTGTGGATCCTGCTCTGG TRUE
## 3 Hsap_GATA1_ex2_gR17f CCAGTTTGTGGATCCTGCTCNGG GTAGTTTGTGGATCCTGTTCTAG
## 4 Hsap_GATA1_ex2_gR20r GGTGTGGAGGACACCAGAGCNGG GGAGCTGAGGACACCAGAGCGGG
## 5 Hsap_GATA1_ex2_gR20r GGTGTGGAGGACACCAGAGCNGG GGTATGTAGGACACCAGAGACAG
## 6 Hsap_GATA1_ex2_gR20r GGTGTGGAGGACACCAGAGCNGG GGTGTGGAGGACACCAGAGCAGG TRUE
## 7 Hsap_GATA1_ex2_gR20r GGTGTGGAGGACACCAGAGCNGG GGTGTGGGGGTCAGCAGAGCCAG
## 8 Hsap_GATA1_ex2_gR20r GGTGTGGAGGACACCAGAGCNGG TGTGTGTAGGACTCCAGAGCAAG
## 9 Hsap_GATA1_ex2_gR39f GTGTCCTCCACACCAGAATCNGG GTGTCCTCCACACCAGAATCAGG TRUE
## 10 Hsap_GATA1_ex2_gR39f GTGTCCTCCACACCAGAATCNGG GTGTCTTCCCCACCAGATTCTAG
## 11 Hsap_GATA1_ex2_gR39f GTGTCCTCCACACCAGAATCNGG GTGTTTTCCACACCAGAATGCAG
## 12 Hsap_GATA1_ex2_gR40f TGTCCTCCACACCAGAATCANGG TGTCCTCAACACCAGAATGATAG
## 13 Hsap_GATA1_ex2_gR40f TGTCCTCCACACCAGAATCANGG TGTCCTCCACACCAGAATCAGGG TRUE
## 14 Hsap_GATA1_ex2_gR7r CCAGAGCAGGATCCACAAACNGG CCAGAGCAGGATCCACAAACTGG TRUE
## 15 Hsap_GATA1_ex2_gR7r CCAGAGCAGGATCCACAAACNGG CCAGTGCAGGATTCACATACTAG
## inIntron entrez_id gene score n.mismatch mismatch.distance2PAM
## 1 TRUE 618 BCYRN1 2.6 3 20,17,13
## 2 2623 GATA1 100 0
## 3 TRUE 1183 CLCN4 0.7 3 20,19,3
## 4 1.4 3 18,16,15
## 5 1 3 17,14,1
## 6 2623 GATA1 100 0
## 7 0.2 3 13,10,7
## 8 TRUE 57477 SHROOM4 0.8 3 20,14,8
## 9 2623 GATA1 100 0
## 10 TRUE 10149 ADGRG2 0.3 3 15,11,3
## 11 0.8 3 16,15,1
## 12 TRUE 139324 HDX 2.9 2 13,2
## 13 2623 GATA1 100 0
## 14 2623 GATA1 100 0
## 15 0.2 3 16,8,3
## alignment isCanonicalPAM forViewInUCSC strand chrom
## 1 A..A...C............ 0 chrX:70453482-70453504 - chrX
## 2 .................... 1 chrX:48649564-48649586 + chrX
## 3 GT...............T.. 0 chrX:10161855-10161877 + chrX
## 4 ..A.CT.............. 1 chrX:152513913-152513935 - chrX
## 5 ...A..T............A 0 chrX:14446436-14446458 - chrX
## 6 .................... 1 chrX:48649577-48649599 - chrX
## 7 .......G..T..G...... 0 chrX:50699578-50699600 + chrX
## 8 T.....T.....T....... 0 chrX:50383047-50383069 - chrX
## 9 .................... 1 chrX:48649586-48649608 + chrX
## 10 .....T...C.......T.. 0 chrX:19043357-19043379 + chrX
## 11 ....TT.............G 0 chrX:81194638-81194660 + chrX
## 12 .......A..........G. 0 chrX:83738197-83738219 - chrX
## 13 .................... 1 chrX:48649587-48649609 + chrX
## 14 .................... 1 chrX:48649564-48649586 - chrX
## 15 ....T.......T....T.. 0 chrX:66676065-66676087 - chrX
## chromStart chromEnd extendedSequence gRNAefficacy
## 1 70453482 70453504 <NA> NA
## 2 48649564 48649586 TCCCCCAGTTTGTGGATCCTGCTCTGGTGT 0.10654794
## 3 10161855 10161877 <NA> NA
## 4 152513913 152513935 <NA> NA
## 5 14446436 14446458 <NA> NA
## 6 48649577 48649599 TTCTGGTGTGGAGGACACCAGAGCAGGATC 0.14209108
## 7 50699578 50699600 <NA> NA
## 8 50383047 50383069 <NA> NA
## 9 48649586 48649608 TCTGGTGTCCTCCACACCAGAATCAGGGGT 0.06962952
## 10 19043357 19043379 <NA> NA
## 11 81194638 81194660 <NA> NA
## 12 83738197 83738219 <NA> NA
## 13 48649587 48649609 CTGGTGTCCTCCACACCAGAATCAGGGGTT 0.26188176
## 14 48649564 48649586 GACACCAGAGCAGGATCCACAAACTGGGGG 0.10171983
## 15 66676065 66676087 <NA> NA
## flankSequence
## 1 GCTGGAGCAGAAACAGAAAGGGGTTAGGGGAGACTGTGGGAAAATAAAGTGTGTTTGGGATCCCTAAGGTGAAGGGAGAGAAAGGGTGACAGTTTCAACAGCTTATCTTGACTTAGGCCAAAGAAGGTTAAGGGGTTCTGAATAACTGTCTCTCCACATCTTCCCTGCACTCCCCAGGTTGCCTCCCTTCCCAGGGATGGACAATTTCTGGATCCTGCTCCAGGTAGACATCCTGGGAAGGGCCCTCTCCCCACAAAACTGCCCTTCCCTGCCTGAGTGCAGGGCAGCCCCTTCCTCAACTTTCACCATTTCTCTCTCTTAACCTTAGGGATCCTGTACAAACATGTTGTTTTCCAGCACATTAGTACTTAGGGATATTTTCATATCATACTGAGCTCCCCACCCCCACCCCACCCAACATAT
## 2 ATGGGGAGGTGGGAAGGAGAAATATGGAGACTGAGGTGATGGAGTGGGAGGAGGGGGAAAGGAGGAAGAGGAGCAGGTGAAAGGATGTGGGGTGGAAGGATTTCTGTGTCTGAGGACCCCTTCTGTCCTCGCAGGTTAATCCCCAGAGGCTCCATGGAGTTCCCTGGCCTGGGGTCCCTGGGGACCTCAGAGCCCCTCCCCCAGTTTGTGGATCCTGCTCTGGTGTCCTCCACACCAGAATCAGGGGTTTTCTTCCCCTCTGGGCCTGAGGGCTTGGATGCAGCAGCTTCCTCCACTGCCCCGAGCACAGCCACCGCTGCAGCTGCGGCACTGGCCTACTACAGGGACGCTGAGGCCTACAGACACTCCCCAGGTAACTCCATTGAGTGGCTGTCTTGGCATTGGCTGAGTGCTGTTGGGGTT
## 3 GTCAGCAAACTTTTTCTGCAAGAGGCCAGAGAGTAAATATTTTCAGGTTTGTGGCCCATGTGGTTTCTATCTCAGCTCTTCAGTTCTGCCACTGTAGCATGAAGGCAGCCACAGAGTGCATGTAATTGAATGGTGTAGCTGTGCTCCAATAAAACTTTATTTACAAAAACAGGCTGTGGGCCAGATTTGGCCCATGGGCTGTAGTTTGTGGATCCTGTTCTAGAGGAATGAGCTAAGGCCATTGCCTTAGCTGCATGGGAATCTCCTGGTAGATTGAGGGGATGGACTGGGCCAAGGGAAGGAAAGAGATTGGATGGTGACTAGAGGGAAAGTGGAAACAGGGAGGGTTTTTTTTTTTTAATGTTTAAATGCTGATGGCAAGGAGAGGAAGAGGTTGAAGATACAGGAGAAGGAAAGGGGAAT
## 4 GCCTTGACTGTACCATGGTCACACATTTCTTATCCAAAAGCAAAGGGATTGTTCATATAAAGAAATTATGAGTGTTGGTTGATCAGCTCAGTTTTGGAAATGTTAAAGTTGAGGTGCATATGAAGCACCTGAGGGCGGATGTCAAGTGGGGATAAGCAATATACCCTGGATATATGAGCCTCATGTTCAGGAGAGTGGTAGGAGCTGAGGACACCAGAGCGGGAATTAATATCACACAGATTGCGTTTAATCAAGGAGACTAGGAGAGATCACCTAGGGAATGACTGTAAACAAAAAAGGGAAGTCTAAAGACTGATTCCGCGTTCCTTCAATACTTTGAGCTACCGCAGAATTGGGGCATCAGTAATGAAGCCGAAGAATGAGTGAACAAATAAGTAAAAGAAAACCAGCTGAAACACAGTA
## 5 ACATTTTCCCCATGACATTCTTTTAATTGCAAATGCTCCTGAGAAGAGGAATAACACATGAAACGAATCCGCTCTGATCACAGATTTTTCTCCTTCTCAGATAAATCCAATTAACAGAAGATGTCTTAGAATTTAAGGGGGGGCAGAAAGAGAATCTCTGTGGGTAGAGGGTTGGGGAAAAAGCATGGGATGGTAGTTTTGGTATGTAGGACACCAGAGACAGTCCTCACCTAATTCAATGGCTTAAAATTATACCAAAGGAAAACAACTCATAACCCAAAAGGAATTAGGATGCACACACCAGGCTTACTCAGACATGTGGTGCTGCCTCCTATTTTACTTTATGGTAGATAAAGTTGAGCAGTTGTATAAATAGAAATGACTGGAGCCTTTACGTCTAATAGCAGAACTCATTTTTTATTT
## 6 GGATCTCCATGGCAACCCCAACAGCACTCAGCCAATGCCAAGACAGCCACTCAATGGAGTTACCTGGGGAGTGTCTGTAGGCCTCAGCGTCCCTGTAGTAGGCCAGTGCCGCAGCTGCAGCGGTGGCTGTGCTCGGGGCAGTGGAGGAAGCTGCTGCATCCAAGCCCTCAGGCCCAGAGGGGAAGAAAACCCCTGATTCTGGTGTGGAGGACACCAGAGCAGGATCCACAAACTGGGGGAGGGGCTCTGAGGTCCCCAGGGACCCCAGGCCAGGGAACTCCATGGAGCCTCTGGGGATTAACCTGCGAGGACAGAAGGGGTCCTCAGACACAGAAATCCTTCCACCCCACATCCTTTCACCTGCTCCTCTTCCTCCTTTCCCCCTCCTCCCACTCCATCACCTCAGTCTCCATATTTCTCCTT
## 7 AGACATAAGATGAAAATAAAGATTTTAAGAGAGAGACCATGAGTGGTGGGCCAATACCTTTATTGGTCCAGGGCATTATCCAAACAGGTTTCCCTCAGGGAGTTTTAACTGGTGGGTTTAAAGCAATCAGGCATGAATTCCAATAGGTAACGCTGTGTCTGAGAGGTGGTCACTGTGGCATATCTGCACAGTGCATGCAGGGTGTGGGGGTCAGCAGAGCCAGTCAAGAGGTTGTATCTAGCTATCTTATAGAGAAGTGATCACCAAGAGGAAGTTGTCTAAGACACATATCTGGATCAACCACATTATGAAACTGGGGGTGGGGAGGGTGTAGAACTGGAAACTGTGGCCAGGCGCGGTGGCTCATGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGTGGACAGATCACGAGGTCAGGAGT
## 8 TCAAGAACCCTAAATATGGAAAAAGGCTTGAAATAGGAAGGATACCCTGTTCACCACTATATCCTCAGTGTCTAAAAGCATAGGTCATAGTAGATACCCATGGTACATAGTACTACTACTGGTAGATGCTTGACAAATAGTTGTTGACCGGATAACTACCTGGTAGCAGCAGCTGGAGAGATTTCCCGGCAGTGTGGCTTTGTGTGTAGGACTCCAGAGCAAGCATAGCCCCCAAGGTCTCCTTGTGCCTTCAGCCCCCACTCTGGGCACTCTGAAATCAATAATAGAACAATGTGATTACAAAGCTAGGGGCTTCACCTGGCTTTCCTCCGCTTAAATCTTAGCCACTGTTGCTGACCTGTGGCTATACTTCCTGGCCTTTGACAAACAGCTTCCCCTGCCCTTCCTCCTATACCTCCTGGT
## 9 TATGGAGACTGAGGTGATGGAGTGGGAGGAGGGGGAAAGGAGGAAGAGGAGCAGGTGAAAGGATGTGGGGTGGAAGGATTTCTGTGTCTGAGGACCCCTTCTGTCCTCGCAGGTTAATCCCCAGAGGCTCCATGGAGTTCCCTGGCCTGGGGTCCCTGGGGACCTCAGAGCCCCTCCCCCAGTTTGTGGATCCTGCTCTGGTGTCCTCCACACCAGAATCAGGGGTTTTCTTCCCCTCTGGGCCTGAGGGCTTGGATGCAGCAGCTTCCTCCACTGCCCCGAGCACAGCCACCGCTGCAGCTGCGGCACTGGCCTACTACAGGGACGCTGAGGCCTACAGACACTCCCCAGGTAACTCCATTGAGTGGCTGTCTTGGCATTGGCTGAGTGCTGTTGGGGTTGCCATGGAGATCCTTGGCTAGG
## 10 TGCTGGGATTATAGTCATGAGCCACCATGCCCCACCTACAGGTCACCTTCTTTCTGGGGGTTCCCCTATCCCTCCGCTGCAGTCCGGTTTAATTGCTCCTGCTGCTCTACATTCAGAGACTTAGAACATCCATCCTTTGGTACCCTATACAAGCCACCTTCAGTGCAGGGGTTGCACTGTATTGTTATTTGTTGCATCTTGTGTCTTCCCCACCAGATTCTAGAGCCCTGGTTCTCACAATTAGCCTCATGTTGGAAACATGCCAATGCTGGGTCTCAGCCCCTAAATACTGATTTGTTATGGAGCATGCCTGGGCACTGGGATTTTTGAAAGCTCCCAGGTGATTTTAATATGCAACCAGGGATGAGAAACATTGCTCTAGACTTGGTTTCCTATCTCAGAGAGTTGGGTTTATCCCTGTGG
## 11 AAGGAAGTAATTAAGATTAAATGAGGCTTGGAGAATGGGGCTCTAATTCGATAGGATTAATATCATTATAAGAAGAGGTACCAGAGTGTTGGCATGTGCATGCTCTCTGTCTTTCTTCGTGTGTGTGTGTGTGTGTATGTGTGTGTTTCTCCATGTGTATGCACTGAGGAAAGGACATGTGAGGATATAGAAAGAAGGCAGTGTTTTCCACACCAGAATGCAGCCCTTATCAGAAGTCAAATAAACCAGAACCTTTATCTTAGACTTCTAGCCTCTAAAACACTGAAAAATAAATTTTCGTTTAAACTGCCAGGTCTATAGTACCTTGTTACAGCAGCCTAAGGTGACTAATATAATTGTCAAGGTAGTGTTACTGGGAATTTTGTATTGGGATGGATTTTTTAACTGAGGAGATCAACACAG
## 12 GCAGGTGCACATAAGTCAAGAATTGAGGTTTGGGAACCTCCACCTAGATTTCAGAGGATGTATGAAAATGTCTGGATGTCCCTGCAGACATTTGCTGCATGGGTGGAGCCCTCATGGAGAACCTCTGCTAGGGCAATGTGGAAGAGACATGTGGGGTTGGAGCCCCAACACAGATAGTGGAGCTGTGAGAAGAGGGCCACTGTCCTCAACACCAGAATGATAGATCCACCGACAGCTTGCACCATGAGCCTGGGAAAGCCACATACACTTAACATCAGTCTGTGAAATCAGCTGAGGGGGGCGCTGTACCCTGCAAAGCCACAAAGTTGGAGCTGCCCAAGCCCTTTGATGCCCACTCCTTGCATCAGCATAACCTGAATGTGGGACATGGAGTCAAAGGAGATCACTTTGGAACTTTAAGTT
## 13 ATGGAGACTGAGGTGATGGAGTGGGAGGAGGGGGAAAGGAGGAAGAGGAGCAGGTGAAAGGATGTGGGGTGGAAGGATTTCTGTGTCTGAGGACCCCTTCTGTCCTCGCAGGTTAATCCCCAGAGGCTCCATGGAGTTCCCTGGCCTGGGGTCCCTGGGGACCTCAGAGCCCCTCCCCCAGTTTGTGGATCCTGCTCTGGTGTCCTCCACACCAGAATCAGGGGTTTTCTTCCCCTCTGGGCCTGAGGGCTTGGATGCAGCAGCTTCCTCCACTGCCCCGAGCACAGCCACCGCTGCAGCTGCGGCACTGGCCTACTACAGGGACGCTGAGGCCTACAGACACTCCCCAGGTAACTCCATTGAGTGGCTGTCTTGGCATTGGCTGAGTGCTGTTGGGGTTGCCATGGAGATCCTTGGCTAGGT
## 14 AACCCCAACAGCACTCAGCCAATGCCAAGACAGCCACTCAATGGAGTTACCTGGGGAGTGTCTGTAGGCCTCAGCGTCCCTGTAGTAGGCCAGTGCCGCAGCTGCAGCGGTGGCTGTGCTCGGGGCAGTGGAGGAAGCTGCTGCATCCAAGCCCTCAGGCCCAGAGGGGAAGAAAACCCCTGATTCTGGTGTGGAGGACACCAGAGCAGGATCCACAAACTGGGGGAGGGGCTCTGAGGTCCCCAGGGACCCCAGGCCAGGGAACTCCATGGAGCCTCTGGGGATTAACCTGCGAGGACAGAAGGGGTCCTCAGACACAGAAATCCTTCCACCCCACATCCTTTCACCTGCTCCTCTTCCTCCTTTCCCCCTCCTCCCACTCCATCACCTCAGTCTCCATATTTCTCCTTCCCACCTCCCCAT
## 15 TTTGCACAGTTTGGTGGAAAAAAGAATGGTTTCCCAGCTGGGTAGCACACTCACTCACTGCCTACCTTGGCTAGAGGTTGGGGACTCCCCTGCCCTGTGTGGCTATCAGGTGGGCCACCGCAACACACTTCTCTTTTCTCTCCATGTACAACACCAGCCGCCTAGTCAATTCTGATAAGATAAACTAGATACCTTGGTTGCCAGTGCAGGATTCACATACTAGTTATTATTTATTTTTTATGGGAGCCTCTGATCCCCAGTGCTTTTGTCATCCATCTTGGCCCCTCCCCAATATTTTCTTCTAAGACTCTTATAGTTTTAAATCTAATGTTTAGGCTTTTGATCCATTTTAAGTTAATTTTGTATAAAGAAGGGGTCCAATTTCATACTTTTACATGTGGATATTCAGTTTTTCTAATACAA
##
## $gRNAs.bedFormat
## [,1] [,2] [,3] [,4] [,5]
## [1,] "chrX" "48649563" "48649586" "Hsap_GATA1_ex2_gR17f" "106.547942216103"
## [2,] "chrX" "48649576" "48649599" "Hsap_GATA1_ex2_gR20r" "142.091080678942"
## [3,] "chrX" "48649585" "48649608" "Hsap_GATA1_ex2_gR39f" "69.6295234374132"
## [4,] "chrX" "48649586" "48649609" "Hsap_GATA1_ex2_gR40f" "261.881755628254"
## [5,] "chrX" "48649563" "48649586" "Hsap_GATA1_ex2_gR7r" "101.719829329578"
## [,6] [,7] [,8]
## [1,] "+" "48649579" "48649581"
## [2,] "-" "48649581" "48649583"
## [3,] "+" "48649601" "48649603"
## [4,] "+" "48649602" "48649604"
## [5,] "-" "48649568" "48649570"
##
## $REcutDetails
## gRNAPlusPAM REcutgRNAName REname REpattern
## 2 GGTGTGGAGGACACCAGAGCAGG Hsap_GATA1_ex2_gR20r Aco12261II CCRGAG
## 3 GTGTCCTCCACACCAGAATCAGG Hsap_GATA1_ex2_gR39f BslI CCNNNNNNNGG
## 4 TGTCCTCCACACCAGAATCAGGG Hsap_GATA1_ex2_gR40f BslI CCNNNNNNNGG
## 5 CCAGAGCAGGATCCACAAACTGG Hsap_GATA1_ex2_gR7r BslI CCNNNNNNNGG
## 6 GTGTCCTCCACACCAGAATCAGG Hsap_GATA1_ex2_gR39f HinfI GANTC
## 7 TGTCCTCCACACCAGAATCAGGG Hsap_GATA1_ex2_gR40f HinfI GANTC
## 8 CCAGAGCAGGATCCACAAACTGG Hsap_GATA1_ex2_gR7r PflMI CCANNNNNTGG
## 9 GTGTCCTCCACACCAGAATCAGG Hsap_GATA1_ex2_gR39f TfiI GAWTC
## 10 TGTCCTCCACACCAGAATCAGGG Hsap_GATA1_ex2_gR40f TfiI GAWTC
## REcutStart REcutEnd
## 2 14 19
## 3 13 23
## 4 12 22
## 5 13 23
## 6 16 20
## 7 15 19
## 8 13 23
## 9 16 20
## 10 15 19
##
## $REs.isUnique100
## [1] "" "" "HinfI TfiI" "HinfI TfiI" "PflMI"
##
## $REs.isUnique50
## [1] "" "" "HinfI TfiI" "HinfI TfiI" "PflMI"
Paired nickases decreases off-target cleavage by requiring the independent binding of two separate gRNAs around a genomic region. Here is how to find gRNAs in paired configuration.
offTargetAnalysis(inputFilePath, findPairedgRNAOnly = TRUE, min.gap = 0, max.gap = 20, BSgenomeName = Hsapiens, chromToSearch ="chrX", txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, orgAnn = org.Hs.egSYMBOL, max.mismatch = 0, outputDir = outputDir, overwrite = TRUE)
## Validating input ...
## Searching for gRNAs ...
## >>> Finding all hits in sequence chrX ...
## >>> DONE searching
## Building feature vectors for scoring ...
## Calculating scores ...
## Annotating, filtering and generating reports ...
## 403 genes were dropped because they have exons located on both strands
## of the same reference sequence or on more than one reference sequence,
## so cannot be represented by a single genomic range.
## Use 'single.strand.genes.only=FALSE' to get all the genes in a
## GRangesList object, or use suppressMessages() to suppress this message.
## Done annotating
## Add paired information...
## Add RE information...
## write gRNAs to bed file...
## Scan for REsites in flanking region...
## Done. Please check output files in directory /__w/CRISPRseekGUIDEseqBioc2020Workshop/CRISPRseekGUIDEseqBioc2020Workshop/vignettes/CRISPRseekDemo/
## $on.target
## names forViewInUCSC
## [1,] "Hsap_GATA1_ex2_gR39f" "chrX:48649586-48649608"
## [2,] "Hsap_GATA1_ex2_gR40f" "chrX:48649587-48649609"
## [3,] "Hsap_GATA1_ex2_gR7r" "chrX:48649564-48649586"
## extendedSequence gRNAefficacy
## [1,] "TCTGGTGTCCTCCACACCAGAATCAGGGGT" "0.0696295234374132"
## [2,] "CTGGTGTCCTCCACACCAGAATCAGGGGTT" "0.261881755628254"
## [3,] "GACACCAGAGCAGGATCCACAAACTGGGGG" "0.101719829329578"
##
## $summary
## names forViewInUCSC extendedSequence
## 1 Hsap_GATA1_ex2_gR39f chrX:48649586-48649608 TCTGGTGTCCTCCACACCAGAATCAGGGGT
## 2 Hsap_GATA1_ex2_gR40f chrX:48649587-48649609 CTGGTGTCCTCCACACCAGAATCAGGGGTT
## 3 Hsap_GATA1_ex2_gR7r chrX:48649564-48649586 GACACCAGAGCAGGATCCACAAACTGGGGG
## gRNAefficacy gRNAsPlusPAM top5OfftargetTotalScore
## 1 0.0696295234374132 GTGTCCTCCACACCAGAATCNGG NA
## 2 0.261881755628254 TGTCCTCCACACCAGAATCANGG NA
## 3 0.101719829329578 CCAGAGCAGGATCCACAAACNGG NA
## top10OfftargetTotalScore top1Hit.onTarget.MMdistance2PAM
## 1 NA NMM
## 2 NA NMM
## 3 NA NMM
## topOfftarget1MMdistance2PAM topOfftarget2MMdistance2PAM
## 1
## 2
## 3
## topOfftarget3MMdistance2PAM topOfftarget4MMdistance2PAM
## 1
## 2
## 3
## topOfftarget5MMdistance2PAM topOfftarget6MMdistance2PAM
## 1
## 2
## 3
## topOfftarget7MMdistance2PAM topOfftarget8MMdistance2PAM
## 1
## 2
## 3
## topOfftarget9MMdistance2PAM topOfftarget10MMdistance2PAM
## 1
## 2
## 3
## PairedgRNAName REname uniqREin200
## 1 Hsap_GATA1_ex2_gR7r BslI HinfI TfiI
## 2 Hsap_GATA1_ex2_gR7r BslI HinfI TfiI
## 3 Hsap_GATA1_ex2_gR39f Hsap_GATA1_ex2_gR40f BslI PflMI
## uniqREin100
## 1
## 2
## 3
##
## $offtarget
## name gRNAPlusPAM OffTargetSequence inExon
## 1 Hsap_GATA1_ex2_gR39f GTGTCCTCCACACCAGAATCNGG GTGTCCTCCACACCAGAATCAGG TRUE
## 2 Hsap_GATA1_ex2_gR40f TGTCCTCCACACCAGAATCANGG TGTCCTCCACACCAGAATCAGGG TRUE
## 3 Hsap_GATA1_ex2_gR7r CCAGAGCAGGATCCACAAACNGG CCAGAGCAGGATCCACAAACTGG TRUE
## inIntron entrez_id gene score n.mismatch mismatch.distance2PAM
## 1 2623 GATA1 100 0 <NA>
## 2 2623 GATA1 100 0 <NA>
## 3 2623 GATA1 100 0 <NA>
## alignment isCanonicalPAM forViewInUCSC strand chrom
## 1 .................... 1 chrX:48649586-48649608 + chrX
## 2 .................... 1 chrX:48649587-48649609 + chrX
## 3 .................... 1 chrX:48649564-48649586 - chrX
## chromStart chromEnd extendedSequence gRNAefficacy
## 1 48649586 48649608 TCTGGTGTCCTCCACACCAGAATCAGGGGT 0.06962952
## 2 48649587 48649609 CTGGTGTCCTCCACACCAGAATCAGGGGTT 0.26188176
## 3 48649564 48649586 GACACCAGAGCAGGATCCACAAACTGGGGG 0.10171983
## flankSequence
## 1 TATGGAGACTGAGGTGATGGAGTGGGAGGAGGGGGAAAGGAGGAAGAGGAGCAGGTGAAAGGATGTGGGGTGGAAGGATTTCTGTGTCTGAGGACCCCTTCTGTCCTCGCAGGTTAATCCCCAGAGGCTCCATGGAGTTCCCTGGCCTGGGGTCCCTGGGGACCTCAGAGCCCCTCCCCCAGTTTGTGGATCCTGCTCTGGTGTCCTCCACACCAGAATCAGGGGTTTTCTTCCCCTCTGGGCCTGAGGGCTTGGATGCAGCAGCTTCCTCCACTGCCCCGAGCACAGCCACCGCTGCAGCTGCGGCACTGGCCTACTACAGGGACGCTGAGGCCTACAGACACTCCCCAGGTAACTCCATTGAGTGGCTGTCTTGGCATTGGCTGAGTGCTGTTGGGGTTGCCATGGAGATCCTTGGCTAGG
## 2 ATGGAGACTGAGGTGATGGAGTGGGAGGAGGGGGAAAGGAGGAAGAGGAGCAGGTGAAAGGATGTGGGGTGGAAGGATTTCTGTGTCTGAGGACCCCTTCTGTCCTCGCAGGTTAATCCCCAGAGGCTCCATGGAGTTCCCTGGCCTGGGGTCCCTGGGGACCTCAGAGCCCCTCCCCCAGTTTGTGGATCCTGCTCTGGTGTCCTCCACACCAGAATCAGGGGTTTTCTTCCCCTCTGGGCCTGAGGGCTTGGATGCAGCAGCTTCCTCCACTGCCCCGAGCACAGCCACCGCTGCAGCTGCGGCACTGGCCTACTACAGGGACGCTGAGGCCTACAGACACTCCCCAGGTAACTCCATTGAGTGGCTGTCTTGGCATTGGCTGAGTGCTGTTGGGGTTGCCATGGAGATCCTTGGCTAGGT
## 3 AACCCCAACAGCACTCAGCCAATGCCAAGACAGCCACTCAATGGAGTTACCTGGGGAGTGTCTGTAGGCCTCAGCGTCCCTGTAGTAGGCCAGTGCCGCAGCTGCAGCGGTGGCTGTGCTCGGGGCAGTGGAGGAAGCTGCTGCATCCAAGCCCTCAGGCCCAGAGGGGAAGAAAACCCCTGATTCTGGTGTGGAGGACACCAGAGCAGGATCCACAAACTGGGGGAGGGGCTCTGAGGTCCCCAGGGACCCCAGGCCAGGGAACTCCATGGAGCCTCTGGGGATTAACCTGCGAGGACAGAAGGGGTCCTCAGACACAGAAATCCTTCCACCCCACATCCTTTCACCTGCTCCTCTTCCTCCTTTCCCCCTCCTCCCACTCCATCACCTCAGTCTCCATATTTCTCCTTCCCACCTCCCCAT
##
## $gRNAs.bedFormat
## [,1] [,2] [,3] [,4] [,5]
## [1,] "chrX" "48649585" "48649608" "Hsap_GATA1_ex2_gR39f" "69.6295234374132"
## [2,] "chrX" "48649586" "48649609" "Hsap_GATA1_ex2_gR40f" "261.881755628254"
## [3,] "chrX" "48649563" "48649586" "Hsap_GATA1_ex2_gR7r" "101.719829329578"
## [,6] [,7] [,8]
## [1,] "+" "48649601" "48649603"
## [2,] "+" "48649602" "48649604"
## [3,] "-" "48649568" "48649570"
##
## $REcutDetails
## ReversegRNAPlusPAM ReversegRNAName ForwardgRNAPlusPAM
## 1 CCAGAGCAGGATCCACAAACTGG Hsap_GATA1_ex2_gR7r GTGTCCTCCACACCAGAATCAGG
## 2 CCAGAGCAGGATCCACAAACTGG Hsap_GATA1_ex2_gR7r GTGTCCTCCACACCAGAATCAGG
## 3 CCAGAGCAGGATCCACAAACTGG Hsap_GATA1_ex2_gR7r GTGTCCTCCACACCAGAATCAGG
## 4 CCAGAGCAGGATCCACAAACTGG Hsap_GATA1_ex2_gR7r GTGTCCTCCACACCAGAATCAGG
## 5 CCAGAGCAGGATCCACAAACTGG Hsap_GATA1_ex2_gR7r GTGTCCTCCACACCAGAATCAGG
## 6 CCAGAGCAGGATCCACAAACTGG Hsap_GATA1_ex2_gR7r GTGTCCTCCACACCAGAATCAGG
## 7 CCAGAGCAGGATCCACAAACTGG Hsap_GATA1_ex2_gR7r TGTCCTCCACACCAGAATCAGGG
## 8 CCAGAGCAGGATCCACAAACTGG Hsap_GATA1_ex2_gR7r TGTCCTCCACACCAGAATCAGGG
## 9 CCAGAGCAGGATCCACAAACTGG Hsap_GATA1_ex2_gR7r TGTCCTCCACACCAGAATCAGGG
## 10 CCAGAGCAGGATCCACAAACTGG Hsap_GATA1_ex2_gR7r TGTCCTCCACACCAGAATCAGGG
## 11 CCAGAGCAGGATCCACAAACTGG Hsap_GATA1_ex2_gR7r TGTCCTCCACACCAGAATCAGGG
## 12 CCAGAGCAGGATCCACAAACTGG Hsap_GATA1_ex2_gR7r TGTCCTCCACACCAGAATCAGGG
## ForwardgRNAName gap ForwardREcutgRNAName ForwardREname ForwardREpattern
## 1 Hsap_GATA1_ex2_gR39f 0 Hsap_GATA1_ex2_gR39f BslI CCNNNNNNNGG
## 2 Hsap_GATA1_ex2_gR39f 0 Hsap_GATA1_ex2_gR39f BslI CCNNNNNNNGG
## 3 Hsap_GATA1_ex2_gR39f 0 Hsap_GATA1_ex2_gR39f HinfI GANTC
## 4 Hsap_GATA1_ex2_gR39f 0 Hsap_GATA1_ex2_gR39f HinfI GANTC
## 5 Hsap_GATA1_ex2_gR39f 0 Hsap_GATA1_ex2_gR39f TfiI GAWTC
## 6 Hsap_GATA1_ex2_gR39f 0 Hsap_GATA1_ex2_gR39f TfiI GAWTC
## 7 Hsap_GATA1_ex2_gR40f 1 Hsap_GATA1_ex2_gR40f BslI CCNNNNNNNGG
## 8 Hsap_GATA1_ex2_gR40f 1 Hsap_GATA1_ex2_gR40f BslI CCNNNNNNNGG
## 9 Hsap_GATA1_ex2_gR40f 1 Hsap_GATA1_ex2_gR40f HinfI GANTC
## 10 Hsap_GATA1_ex2_gR40f 1 Hsap_GATA1_ex2_gR40f HinfI GANTC
## 11 Hsap_GATA1_ex2_gR40f 1 Hsap_GATA1_ex2_gR40f TfiI GAWTC
## 12 Hsap_GATA1_ex2_gR40f 1 Hsap_GATA1_ex2_gR40f TfiI GAWTC
## ForwardREcutStart ForwardREcutEnd ReverseREcutgRNAName ReverseREname
## 1 13 23 Hsap_GATA1_ex2_gR7r BslI
## 2 13 23 Hsap_GATA1_ex2_gR7r PflMI
## 3 16 20 Hsap_GATA1_ex2_gR7r BslI
## 4 16 20 Hsap_GATA1_ex2_gR7r PflMI
## 5 16 20 Hsap_GATA1_ex2_gR7r BslI
## 6 16 20 Hsap_GATA1_ex2_gR7r PflMI
## 7 12 22 Hsap_GATA1_ex2_gR7r BslI
## 8 12 22 Hsap_GATA1_ex2_gR7r PflMI
## 9 15 19 Hsap_GATA1_ex2_gR7r BslI
## 10 15 19 Hsap_GATA1_ex2_gR7r PflMI
## 11 15 19 Hsap_GATA1_ex2_gR7r BslI
## 12 15 19 Hsap_GATA1_ex2_gR7r PflMI
## ReverseREpattern ReverseREcutStart ReverseREcutEnd
## 1 CCNNNNNNNGG 13 23
## 2 CCANNNNNTGG 13 23
## 3 CCNNNNNNNGG 13 23
## 4 CCANNNNNTGG 13 23
## 5 CCNNNNNNNGG 13 23
## 6 CCANNNNNTGG 13 23
## 7 CCNNNNNNNGG 13 23
## 8 CCANNNNNTGG 13 23
## 9 CCNNNNNNNGG 13 23
## 10 CCANNNNNTGG 13 23
## 11 CCNNNNNNNGG 13 23
## 12 CCANNNNNTGG 13 23
##
## $REs.isUnique100
## [1] "" "" ""
##
## $REs.isUnique50
## [1] "" "" ""
You can specify the criteria of off-target search by specifying max.mismatch, e.g., allowing up to 2 mismatches to be considered as potential off-targets, by default it is set to 4.
offTargetAnalysis(inputFilePath, findPairedgRNAOnly = TRUE, min.gap = 0, max.gap = 20, BSgenomeName = Hsapiens, chromToSearch ="chrX", txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, orgAnn = org.Hs.egSYMBOL, max.mismatch = 2, outputDir = outputDir, overwrite = TRUE)
## Validating input ...
## Searching for gRNAs ...
## >>> Finding all hits in sequence chrX ...
## >>> DONE searching
## Building feature vectors for scoring ...
## Calculating scores ...
## Annotating, filtering and generating reports ...
## 403 genes were dropped because they have exons located on both strands
## of the same reference sequence or on more than one reference sequence,
## so cannot be represented by a single genomic range.
## Use 'single.strand.genes.only=FALSE' to get all the genes in a
## GRangesList object, or use suppressMessages() to suppress this message.
## Done annotating
## Add paired information...
## Add RE information...
## write gRNAs to bed file...
## Scan for REsites in flanking region...
## Done. Please check output files in directory /__w/CRISPRseekGUIDEseqBioc2020Workshop/CRISPRseekGUIDEseqBioc2020Workshop/vignettes/CRISPRseekDemo/
## $on.target
## names forViewInUCSC
## [1,] "Hsap_GATA1_ex2_gR39f" "chrX:48649586-48649608"
## [2,] "Hsap_GATA1_ex2_gR40f" "chrX:48649587-48649609"
## [3,] "Hsap_GATA1_ex2_gR7r" "chrX:48649564-48649586"
## extendedSequence gRNAefficacy
## [1,] "TCTGGTGTCCTCCACACCAGAATCAGGGGT" "0.0696295234374132"
## [2,] "CTGGTGTCCTCCACACCAGAATCAGGGGTT" "0.261881755628254"
## [3,] "GACACCAGAGCAGGATCCACAAACTGGGGG" "0.101719829329578"
##
## $summary
## names forViewInUCSC extendedSequence
## 1 Hsap_GATA1_ex2_gR39f chrX:48649586-48649608 TCTGGTGTCCTCCACACCAGAATCAGGGGT
## 2 Hsap_GATA1_ex2_gR40f chrX:48649587-48649609 CTGGTGTCCTCCACACCAGAATCAGGGGTT
## 3 Hsap_GATA1_ex2_gR7r chrX:48649564-48649586 GACACCAGAGCAGGATCCACAAACTGGGGG
## gRNAefficacy gRNAsPlusPAM top5OfftargetTotalScore
## 1 0.0696295234374132 GTGTCCTCCACACCAGAATCNGG NA
## 2 0.261881755628254 TGTCCTCCACACCAGAATCANGG 2.9
## 3 0.101719829329578 CCAGAGCAGGATCCACAAACNGG NA
## top10OfftargetTotalScore top1Hit.onTarget.MMdistance2PAM
## 1 NA NMM
## 2 2.9 NMM
## 3 NA NMM
## topOfftarget1MMdistance2PAM topOfftarget2MMdistance2PAM
## 1
## 2 13,2
## 3
## topOfftarget3MMdistance2PAM topOfftarget4MMdistance2PAM
## 1
## 2
## 3
## topOfftarget5MMdistance2PAM topOfftarget6MMdistance2PAM
## 1
## 2
## 3
## topOfftarget7MMdistance2PAM topOfftarget8MMdistance2PAM
## 1
## 2
## 3
## topOfftarget9MMdistance2PAM topOfftarget10MMdistance2PAM
## 1
## 2
## 3
## PairedgRNAName REname uniqREin200
## 1 Hsap_GATA1_ex2_gR7r BslI HinfI TfiI
## 2 Hsap_GATA1_ex2_gR7r BslI HinfI TfiI
## 3 Hsap_GATA1_ex2_gR39f Hsap_GATA1_ex2_gR40f BslI PflMI
## uniqREin100
## 1
## 2
## 3
##
## $offtarget
## name gRNAPlusPAM OffTargetSequence inExon
## 1 Hsap_GATA1_ex2_gR39f GTGTCCTCCACACCAGAATCNGG GTGTCCTCCACACCAGAATCAGG TRUE
## 2 Hsap_GATA1_ex2_gR40f TGTCCTCCACACCAGAATCANGG TGTCCTCAACACCAGAATGATAG
## 3 Hsap_GATA1_ex2_gR40f TGTCCTCCACACCAGAATCANGG TGTCCTCCACACCAGAATCAGGG TRUE
## 4 Hsap_GATA1_ex2_gR7r CCAGAGCAGGATCCACAAACNGG CCAGAGCAGGATCCACAAACTGG TRUE
## inIntron entrez_id gene score n.mismatch mismatch.distance2PAM
## 1 2623 GATA1 100 0
## 2 TRUE 139324 HDX 2.9 2 13,2
## 3 2623 GATA1 100 0
## 4 2623 GATA1 100 0
## alignment isCanonicalPAM forViewInUCSC strand chrom
## 1 .................... 1 chrX:48649586-48649608 + chrX
## 2 .......A..........G. 0 chrX:83738197-83738219 - chrX
## 3 .................... 1 chrX:48649587-48649609 + chrX
## 4 .................... 1 chrX:48649564-48649586 - chrX
## chromStart chromEnd extendedSequence gRNAefficacy
## 1 48649586 48649608 TCTGGTGTCCTCCACACCAGAATCAGGGGT 0.06962952
## 2 83738197 83738219 <NA> NA
## 3 48649587 48649609 CTGGTGTCCTCCACACCAGAATCAGGGGTT 0.26188176
## 4 48649564 48649586 GACACCAGAGCAGGATCCACAAACTGGGGG 0.10171983
## flankSequence
## 1 TATGGAGACTGAGGTGATGGAGTGGGAGGAGGGGGAAAGGAGGAAGAGGAGCAGGTGAAAGGATGTGGGGTGGAAGGATTTCTGTGTCTGAGGACCCCTTCTGTCCTCGCAGGTTAATCCCCAGAGGCTCCATGGAGTTCCCTGGCCTGGGGTCCCTGGGGACCTCAGAGCCCCTCCCCCAGTTTGTGGATCCTGCTCTGGTGTCCTCCACACCAGAATCAGGGGTTTTCTTCCCCTCTGGGCCTGAGGGCTTGGATGCAGCAGCTTCCTCCACTGCCCCGAGCACAGCCACCGCTGCAGCTGCGGCACTGGCCTACTACAGGGACGCTGAGGCCTACAGACACTCCCCAGGTAACTCCATTGAGTGGCTGTCTTGGCATTGGCTGAGTGCTGTTGGGGTTGCCATGGAGATCCTTGGCTAGG
## 2 GCAGGTGCACATAAGTCAAGAATTGAGGTTTGGGAACCTCCACCTAGATTTCAGAGGATGTATGAAAATGTCTGGATGTCCCTGCAGACATTTGCTGCATGGGTGGAGCCCTCATGGAGAACCTCTGCTAGGGCAATGTGGAAGAGACATGTGGGGTTGGAGCCCCAACACAGATAGTGGAGCTGTGAGAAGAGGGCCACTGTCCTCAACACCAGAATGATAGATCCACCGACAGCTTGCACCATGAGCCTGGGAAAGCCACATACACTTAACATCAGTCTGTGAAATCAGCTGAGGGGGGCGCTGTACCCTGCAAAGCCACAAAGTTGGAGCTGCCCAAGCCCTTTGATGCCCACTCCTTGCATCAGCATAACCTGAATGTGGGACATGGAGTCAAAGGAGATCACTTTGGAACTTTAAGTT
## 3 ATGGAGACTGAGGTGATGGAGTGGGAGGAGGGGGAAAGGAGGAAGAGGAGCAGGTGAAAGGATGTGGGGTGGAAGGATTTCTGTGTCTGAGGACCCCTTCTGTCCTCGCAGGTTAATCCCCAGAGGCTCCATGGAGTTCCCTGGCCTGGGGTCCCTGGGGACCTCAGAGCCCCTCCCCCAGTTTGTGGATCCTGCTCTGGTGTCCTCCACACCAGAATCAGGGGTTTTCTTCCCCTCTGGGCCTGAGGGCTTGGATGCAGCAGCTTCCTCCACTGCCCCGAGCACAGCCACCGCTGCAGCTGCGGCACTGGCCTACTACAGGGACGCTGAGGCCTACAGACACTCCCCAGGTAACTCCATTGAGTGGCTGTCTTGGCATTGGCTGAGTGCTGTTGGGGTTGCCATGGAGATCCTTGGCTAGGT
## 4 AACCCCAACAGCACTCAGCCAATGCCAAGACAGCCACTCAATGGAGTTACCTGGGGAGTGTCTGTAGGCCTCAGCGTCCCTGTAGTAGGCCAGTGCCGCAGCTGCAGCGGTGGCTGTGCTCGGGGCAGTGGAGGAAGCTGCTGCATCCAAGCCCTCAGGCCCAGAGGGGAAGAAAACCCCTGATTCTGGTGTGGAGGACACCAGAGCAGGATCCACAAACTGGGGGAGGGGCTCTGAGGTCCCCAGGGACCCCAGGCCAGGGAACTCCATGGAGCCTCTGGGGATTAACCTGCGAGGACAGAAGGGGTCCTCAGACACAGAAATCCTTCCACCCCACATCCTTTCACCTGCTCCTCTTCCTCCTTTCCCCCTCCTCCCACTCCATCACCTCAGTCTCCATATTTCTCCTTCCCACCTCCCCAT
##
## $gRNAs.bedFormat
## [,1] [,2] [,3] [,4] [,5]
## [1,] "chrX" "48649585" "48649608" "Hsap_GATA1_ex2_gR39f" "69.6295234374132"
## [2,] "chrX" "48649586" "48649609" "Hsap_GATA1_ex2_gR40f" "261.881755628254"
## [3,] "chrX" "48649563" "48649586" "Hsap_GATA1_ex2_gR7r" "101.719829329578"
## [,6] [,7] [,8]
## [1,] "+" "48649601" "48649603"
## [2,] "+" "48649602" "48649604"
## [3,] "-" "48649568" "48649570"
##
## $REcutDetails
## ReversegRNAPlusPAM ReversegRNAName ForwardgRNAPlusPAM
## 1 CCAGAGCAGGATCCACAAACTGG Hsap_GATA1_ex2_gR7r GTGTCCTCCACACCAGAATCAGG
## 2 CCAGAGCAGGATCCACAAACTGG Hsap_GATA1_ex2_gR7r GTGTCCTCCACACCAGAATCAGG
## 3 CCAGAGCAGGATCCACAAACTGG Hsap_GATA1_ex2_gR7r GTGTCCTCCACACCAGAATCAGG
## 4 CCAGAGCAGGATCCACAAACTGG Hsap_GATA1_ex2_gR7r GTGTCCTCCACACCAGAATCAGG
## 5 CCAGAGCAGGATCCACAAACTGG Hsap_GATA1_ex2_gR7r GTGTCCTCCACACCAGAATCAGG
## 6 CCAGAGCAGGATCCACAAACTGG Hsap_GATA1_ex2_gR7r GTGTCCTCCACACCAGAATCAGG
## 7 CCAGAGCAGGATCCACAAACTGG Hsap_GATA1_ex2_gR7r TGTCCTCCACACCAGAATCAGGG
## 8 CCAGAGCAGGATCCACAAACTGG Hsap_GATA1_ex2_gR7r TGTCCTCCACACCAGAATCAGGG
## 9 CCAGAGCAGGATCCACAAACTGG Hsap_GATA1_ex2_gR7r TGTCCTCCACACCAGAATCAGGG
## 10 CCAGAGCAGGATCCACAAACTGG Hsap_GATA1_ex2_gR7r TGTCCTCCACACCAGAATCAGGG
## 11 CCAGAGCAGGATCCACAAACTGG Hsap_GATA1_ex2_gR7r TGTCCTCCACACCAGAATCAGGG
## 12 CCAGAGCAGGATCCACAAACTGG Hsap_GATA1_ex2_gR7r TGTCCTCCACACCAGAATCAGGG
## ForwardgRNAName gap ForwardREcutgRNAName ForwardREname ForwardREpattern
## 1 Hsap_GATA1_ex2_gR39f 0 Hsap_GATA1_ex2_gR39f BslI CCNNNNNNNGG
## 2 Hsap_GATA1_ex2_gR39f 0 Hsap_GATA1_ex2_gR39f BslI CCNNNNNNNGG
## 3 Hsap_GATA1_ex2_gR39f 0 Hsap_GATA1_ex2_gR39f HinfI GANTC
## 4 Hsap_GATA1_ex2_gR39f 0 Hsap_GATA1_ex2_gR39f HinfI GANTC
## 5 Hsap_GATA1_ex2_gR39f 0 Hsap_GATA1_ex2_gR39f TfiI GAWTC
## 6 Hsap_GATA1_ex2_gR39f 0 Hsap_GATA1_ex2_gR39f TfiI GAWTC
## 7 Hsap_GATA1_ex2_gR40f 1 Hsap_GATA1_ex2_gR40f BslI CCNNNNNNNGG
## 8 Hsap_GATA1_ex2_gR40f 1 Hsap_GATA1_ex2_gR40f BslI CCNNNNNNNGG
## 9 Hsap_GATA1_ex2_gR40f 1 Hsap_GATA1_ex2_gR40f HinfI GANTC
## 10 Hsap_GATA1_ex2_gR40f 1 Hsap_GATA1_ex2_gR40f HinfI GANTC
## 11 Hsap_GATA1_ex2_gR40f 1 Hsap_GATA1_ex2_gR40f TfiI GAWTC
## 12 Hsap_GATA1_ex2_gR40f 1 Hsap_GATA1_ex2_gR40f TfiI GAWTC
## ForwardREcutStart ForwardREcutEnd ReverseREcutgRNAName ReverseREname
## 1 13 23 Hsap_GATA1_ex2_gR7r BslI
## 2 13 23 Hsap_GATA1_ex2_gR7r PflMI
## 3 16 20 Hsap_GATA1_ex2_gR7r BslI
## 4 16 20 Hsap_GATA1_ex2_gR7r PflMI
## 5 16 20 Hsap_GATA1_ex2_gR7r BslI
## 6 16 20 Hsap_GATA1_ex2_gR7r PflMI
## 7 12 22 Hsap_GATA1_ex2_gR7r BslI
## 8 12 22 Hsap_GATA1_ex2_gR7r PflMI
## 9 15 19 Hsap_GATA1_ex2_gR7r BslI
## 10 15 19 Hsap_GATA1_ex2_gR7r PflMI
## 11 15 19 Hsap_GATA1_ex2_gR7r BslI
## 12 15 19 Hsap_GATA1_ex2_gR7r PflMI
## ReverseREpattern ReverseREcutStart ReverseREcutEnd
## 1 CCNNNNNNNGG 13 23
## 2 CCANNNNNTGG 13 23
## 3 CCNNNNNNNGG 13 23
## 4 CCANNNNNTGG 13 23
## 5 CCNNNNNNNGG 13 23
## 6 CCANNNNNTGG 13 23
## 7 CCNNNNNNNGG 13 23
## 8 CCANNNNNTGG 13 23
## 9 CCNNNNNNNGG 13 23
## 10 CCANNNNNTGG 13 23
## 11 CCNNNNNNNGG 13 23
## 12 CCANNNNNTGG 13 23
##
## $REs.isUnique100
## [1] "" "" ""
##
## $REs.isUnique50
## [1] "" "" ""
Paired gRNAs in proper spacing and orientation give more specificity and gRNAs overlap with restriction enzyme cut sites facilitates cleavage monitoring. Calling the function offTargetAnalysis with findPairedgRNAOnly = TRUE and findgRNAsWithREcutOnly = TRUE results in searching, scoring and annotating gRNAs that are in paired configuration and at least one of the pairs overlap a restriction enzyme cut site. To be considered as a pair, gap between forward gRNA and the corresponding reverse gRNA needs to be (min.gap, max.gap) inclusive and the reverse gRNA must sit before the forward gRNA. The default (min.gap, max.gap) is (0,20). In order for a gRNA to be considered overlap with restriction enzyme cut site, the enzyme cut pattern must overlap with one of the gRNA positions specified in overlap.gRNA.positions, default position 17 and 18.
offTargetAnalysis(inputFilePath, findgRNAsWithREcutOnly = TRUE, minREpatternSize = 6, overlap.gRNA.positions = c(17, 18), findPairedgRNAOnly = TRUE, min.gap = 0, max.gap = 20, BSgenomeName = Hsapiens, chromToSearch ="chrX", txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, orgAnn = org.Hs.egSYMBOL, max.mismatch = 0, outputDir = outputDir, overwrite = TRUE)
## Validating input ...
## Searching for gRNAs ...
## >>> Finding all hits in sequence chrX ...
## >>> DONE searching
## Building feature vectors for scoring ...
## Calculating scores ...
## Annotating, filtering and generating reports ...
## 403 genes were dropped because they have exons located on both strands
## of the same reference sequence or on more than one reference sequence,
## so cannot be represented by a single genomic range.
## Use 'single.strand.genes.only=FALSE' to get all the genes in a
## GRangesList object, or use suppressMessages() to suppress this message.
## Done annotating
## Add paired information...
## Add RE information...
## write gRNAs to bed file...
## Scan for REsites in flanking region...
## Done. Please check output files in directory /__w/CRISPRseekGUIDEseqBioc2020Workshop/CRISPRseekGUIDEseqBioc2020Workshop/vignettes/CRISPRseekDemo/
## $on.target
## names forViewInUCSC
## [1,] "Hsap_GATA1_ex2_gR39f" "chrX:48649586-48649608"
## [2,] "Hsap_GATA1_ex2_gR40f" "chrX:48649587-48649609"
## [3,] "Hsap_GATA1_ex2_gR7r" "chrX:48649564-48649586"
## extendedSequence gRNAefficacy
## [1,] "TCTGGTGTCCTCCACACCAGAATCAGGGGT" "0.0696295234374132"
## [2,] "CTGGTGTCCTCCACACCAGAATCAGGGGTT" "0.261881755628254"
## [3,] "GACACCAGAGCAGGATCCACAAACTGGGGG" "0.101719829329578"
##
## $summary
## names forViewInUCSC extendedSequence
## 1 Hsap_GATA1_ex2_gR39f chrX:48649586-48649608 TCTGGTGTCCTCCACACCAGAATCAGGGGT
## 2 Hsap_GATA1_ex2_gR40f chrX:48649587-48649609 CTGGTGTCCTCCACACCAGAATCAGGGGTT
## 3 Hsap_GATA1_ex2_gR7r chrX:48649564-48649586 GACACCAGAGCAGGATCCACAAACTGGGGG
## gRNAefficacy gRNAsPlusPAM top5OfftargetTotalScore
## 1 0.0696295234374132 GTGTCCTCCACACCAGAATCNGG NA
## 2 0.261881755628254 TGTCCTCCACACCAGAATCANGG NA
## 3 0.101719829329578 CCAGAGCAGGATCCACAAACNGG NA
## top10OfftargetTotalScore top1Hit.onTarget.MMdistance2PAM
## 1 NA NMM
## 2 NA NMM
## 3 NA NMM
## topOfftarget1MMdistance2PAM topOfftarget2MMdistance2PAM
## 1
## 2
## 3
## topOfftarget3MMdistance2PAM topOfftarget4MMdistance2PAM
## 1
## 2
## 3
## topOfftarget5MMdistance2PAM topOfftarget6MMdistance2PAM
## 1
## 2
## 3
## topOfftarget7MMdistance2PAM topOfftarget8MMdistance2PAM
## 1
## 2
## 3
## topOfftarget9MMdistance2PAM topOfftarget10MMdistance2PAM
## 1
## 2
## 3
## PairedgRNAName REname uniqREin200
## 1 Hsap_GATA1_ex2_gR7r BslI
## 2 Hsap_GATA1_ex2_gR7r BslI
## 3 Hsap_GATA1_ex2_gR39f Hsap_GATA1_ex2_gR40f BslI PflMI
## uniqREin100
## 1
## 2
## 3
##
## $offtarget
## name gRNAPlusPAM OffTargetSequence inExon
## 1 Hsap_GATA1_ex2_gR39f GTGTCCTCCACACCAGAATCNGG GTGTCCTCCACACCAGAATCAGG TRUE
## 2 Hsap_GATA1_ex2_gR40f TGTCCTCCACACCAGAATCANGG TGTCCTCCACACCAGAATCAGGG TRUE
## 3 Hsap_GATA1_ex2_gR7r CCAGAGCAGGATCCACAAACNGG CCAGAGCAGGATCCACAAACTGG TRUE
## inIntron entrez_id gene score n.mismatch mismatch.distance2PAM
## 1 2623 GATA1 100 0 <NA>
## 2 2623 GATA1 100 0 <NA>
## 3 2623 GATA1 100 0 <NA>
## alignment isCanonicalPAM forViewInUCSC strand chrom
## 1 .................... 1 chrX:48649586-48649608 + chrX
## 2 .................... 1 chrX:48649587-48649609 + chrX
## 3 .................... 1 chrX:48649564-48649586 - chrX
## chromStart chromEnd extendedSequence gRNAefficacy
## 1 48649586 48649608 TCTGGTGTCCTCCACACCAGAATCAGGGGT 0.06962952
## 2 48649587 48649609 CTGGTGTCCTCCACACCAGAATCAGGGGTT 0.26188176
## 3 48649564 48649586 GACACCAGAGCAGGATCCACAAACTGGGGG 0.10171983
## flankSequence
## 1 TATGGAGACTGAGGTGATGGAGTGGGAGGAGGGGGAAAGGAGGAAGAGGAGCAGGTGAAAGGATGTGGGGTGGAAGGATTTCTGTGTCTGAGGACCCCTTCTGTCCTCGCAGGTTAATCCCCAGAGGCTCCATGGAGTTCCCTGGCCTGGGGTCCCTGGGGACCTCAGAGCCCCTCCCCCAGTTTGTGGATCCTGCTCTGGTGTCCTCCACACCAGAATCAGGGGTTTTCTTCCCCTCTGGGCCTGAGGGCTTGGATGCAGCAGCTTCCTCCACTGCCCCGAGCACAGCCACCGCTGCAGCTGCGGCACTGGCCTACTACAGGGACGCTGAGGCCTACAGACACTCCCCAGGTAACTCCATTGAGTGGCTGTCTTGGCATTGGCTGAGTGCTGTTGGGGTTGCCATGGAGATCCTTGGCTAGG
## 2 ATGGAGACTGAGGTGATGGAGTGGGAGGAGGGGGAAAGGAGGAAGAGGAGCAGGTGAAAGGATGTGGGGTGGAAGGATTTCTGTGTCTGAGGACCCCTTCTGTCCTCGCAGGTTAATCCCCAGAGGCTCCATGGAGTTCCCTGGCCTGGGGTCCCTGGGGACCTCAGAGCCCCTCCCCCAGTTTGTGGATCCTGCTCTGGTGTCCTCCACACCAGAATCAGGGGTTTTCTTCCCCTCTGGGCCTGAGGGCTTGGATGCAGCAGCTTCCTCCACTGCCCCGAGCACAGCCACCGCTGCAGCTGCGGCACTGGCCTACTACAGGGACGCTGAGGCCTACAGACACTCCCCAGGTAACTCCATTGAGTGGCTGTCTTGGCATTGGCTGAGTGCTGTTGGGGTTGCCATGGAGATCCTTGGCTAGGT
## 3 AACCCCAACAGCACTCAGCCAATGCCAAGACAGCCACTCAATGGAGTTACCTGGGGAGTGTCTGTAGGCCTCAGCGTCCCTGTAGTAGGCCAGTGCCGCAGCTGCAGCGGTGGCTGTGCTCGGGGCAGTGGAGGAAGCTGCTGCATCCAAGCCCTCAGGCCCAGAGGGGAAGAAAACCCCTGATTCTGGTGTGGAGGACACCAGAGCAGGATCCACAAACTGGGGGAGGGGCTCTGAGGTCCCCAGGGACCCCAGGCCAGGGAACTCCATGGAGCCTCTGGGGATTAACCTGCGAGGACAGAAGGGGTCCTCAGACACAGAAATCCTTCCACCCCACATCCTTTCACCTGCTCCTCTTCCTCCTTTCCCCCTCCTCCCACTCCATCACCTCAGTCTCCATATTTCTCCTTCCCACCTCCCCAT
##
## $gRNAs.bedFormat
## [,1] [,2] [,3] [,4] [,5]
## [1,] "chrX" "48649585" "48649608" "Hsap_GATA1_ex2_gR39f" "69.6295234374132"
## [2,] "chrX" "48649586" "48649609" "Hsap_GATA1_ex2_gR40f" "261.881755628254"
## [3,] "chrX" "48649563" "48649586" "Hsap_GATA1_ex2_gR7r" "101.719829329578"
## [,6] [,7] [,8]
## [1,] "+" "48649601" "48649603"
## [2,] "+" "48649602" "48649604"
## [3,] "-" "48649568" "48649570"
##
## $REcutDetails
## ReversegRNAPlusPAM ReversegRNAName ForwardgRNAPlusPAM
## 1 CCAGAGCAGGATCCACAAACTGG Hsap_GATA1_ex2_gR7r GTGTCCTCCACACCAGAATCAGG
## 2 CCAGAGCAGGATCCACAAACTGG Hsap_GATA1_ex2_gR7r GTGTCCTCCACACCAGAATCAGG
## 3 CCAGAGCAGGATCCACAAACTGG Hsap_GATA1_ex2_gR7r TGTCCTCCACACCAGAATCAGGG
## 4 CCAGAGCAGGATCCACAAACTGG Hsap_GATA1_ex2_gR7r TGTCCTCCACACCAGAATCAGGG
## ForwardgRNAName gap ForwardREcutgRNAName ForwardREname ForwardREpattern
## 1 Hsap_GATA1_ex2_gR39f 0 Hsap_GATA1_ex2_gR39f BslI CCNNNNNNNGG
## 2 Hsap_GATA1_ex2_gR39f 0 Hsap_GATA1_ex2_gR39f BslI CCNNNNNNNGG
## 3 Hsap_GATA1_ex2_gR40f 1 Hsap_GATA1_ex2_gR40f BslI CCNNNNNNNGG
## 4 Hsap_GATA1_ex2_gR40f 1 Hsap_GATA1_ex2_gR40f BslI CCNNNNNNNGG
## ForwardREcutStart ForwardREcutEnd ReverseREcutgRNAName ReverseREname
## 1 13 23 Hsap_GATA1_ex2_gR7r BslI
## 2 13 23 Hsap_GATA1_ex2_gR7r PflMI
## 3 12 22 Hsap_GATA1_ex2_gR7r BslI
## 4 12 22 Hsap_GATA1_ex2_gR7r PflMI
## ReverseREpattern ReverseREcutStart ReverseREcutEnd
## 1 CCNNNNNNNGG 13 23
## 2 CCANNNNNTGG 13 23
## 3 CCNNNNNNNGG 13 23
## 4 CCANNNNNTGG 13 23
##
## $REs.isUnique100
## [1] "" "" ""
##
## $REs.isUnique50
## [1] "" "" ""
offTargetAnalysis(inputFilePath, findgRNAsWithREcutOnly = TRUE, minREpatternSize = 6, overlap.gRNA.positions = c(17, 18), findPairedgRNAOnly = FALSE, BSgenomeName = Hsapiens, chromToSearch ="chrX", max.mismatch = 0, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, orgAnn = org.Hs.egSYMBOL, outputDir = outputDir, overwrite = TRUE)
## Validating input ...
## Searching for gRNAs ...
## >>> Finding all hits in sequence chrX ...
## >>> DONE searching
## Building feature vectors for scoring ...
## Calculating scores ...
## Annotating, filtering and generating reports ...
## 403 genes were dropped because they have exons located on both strands
## of the same reference sequence or on more than one reference sequence,
## so cannot be represented by a single genomic range.
## Use 'single.strand.genes.only=FALSE' to get all the genes in a
## GRangesList object, or use suppressMessages() to suppress this message.
## Done annotating
## Add paired information...
## Add RE information...
## write gRNAs to bed file...
## Scan for REsites in flanking region...
## Done. Please check output files in directory /__w/CRISPRseekGUIDEseqBioc2020Workshop/CRISPRseekGUIDEseqBioc2020Workshop/vignettes/CRISPRseekDemo/
## $on.target
## names forViewInUCSC
## [1,] "Hsap_GATA1_ex2_gR20r" "chrX:48649577-48649599"
## [2,] "Hsap_GATA1_ex2_gR39f" "chrX:48649586-48649608"
## [3,] "Hsap_GATA1_ex2_gR40f" "chrX:48649587-48649609"
## [4,] "Hsap_GATA1_ex2_gR7r" "chrX:48649564-48649586"
## extendedSequence gRNAefficacy
## [1,] "TTCTGGTGTGGAGGACACCAGAGCAGGATC" "0.142091080678942"
## [2,] "TCTGGTGTCCTCCACACCAGAATCAGGGGT" "0.0696295234374132"
## [3,] "CTGGTGTCCTCCACACCAGAATCAGGGGTT" "0.261881755628254"
## [4,] "GACACCAGAGCAGGATCCACAAACTGGGGG" "0.101719829329578"
##
## $summary
## names forViewInUCSC extendedSequence
## 1 Hsap_GATA1_ex2_gR20r chrX:48649577-48649599 TTCTGGTGTGGAGGACACCAGAGCAGGATC
## 2 Hsap_GATA1_ex2_gR39f chrX:48649586-48649608 TCTGGTGTCCTCCACACCAGAATCAGGGGT
## 3 Hsap_GATA1_ex2_gR40f chrX:48649587-48649609 CTGGTGTCCTCCACACCAGAATCAGGGGTT
## 4 Hsap_GATA1_ex2_gR7r chrX:48649564-48649586 GACACCAGAGCAGGATCCACAAACTGGGGG
## gRNAefficacy gRNAsPlusPAM top5OfftargetTotalScore
## 1 0.142091080678942 GGTGTGGAGGACACCAGAGCNGG NA
## 2 0.0696295234374132 GTGTCCTCCACACCAGAATCNGG NA
## 3 0.261881755628254 TGTCCTCCACACCAGAATCANGG NA
## 4 0.101719829329578 CCAGAGCAGGATCCACAAACNGG NA
## top10OfftargetTotalScore top1Hit.onTarget.MMdistance2PAM
## 1 NA NMM
## 2 NA NMM
## 3 NA NMM
## 4 NA NMM
## topOfftarget1MMdistance2PAM topOfftarget2MMdistance2PAM
## 1
## 2
## 3
## 4
## topOfftarget3MMdistance2PAM topOfftarget4MMdistance2PAM
## 1
## 2
## 3
## 4
## topOfftarget5MMdistance2PAM topOfftarget6MMdistance2PAM
## 1
## 2
## 3
## 4
## topOfftarget7MMdistance2PAM topOfftarget8MMdistance2PAM
## 1
## 2
## 3
## 4
## topOfftarget9MMdistance2PAM topOfftarget10MMdistance2PAM REname
## 1 Aco12261II
## 2 BslI
## 3 BslI
## 4 BslI PflMI
## uniqREin200 uniqREin100
## 1
## 2
## 3
## 4 PflMI PflMI
##
## $offtarget
## name gRNAPlusPAM OffTargetSequence inExon
## 1 Hsap_GATA1_ex2_gR20r GGTGTGGAGGACACCAGAGCNGG GGTGTGGAGGACACCAGAGCAGG TRUE
## 2 Hsap_GATA1_ex2_gR39f GTGTCCTCCACACCAGAATCNGG GTGTCCTCCACACCAGAATCAGG TRUE
## 3 Hsap_GATA1_ex2_gR40f TGTCCTCCACACCAGAATCANGG TGTCCTCCACACCAGAATCAGGG TRUE
## 4 Hsap_GATA1_ex2_gR7r CCAGAGCAGGATCCACAAACNGG CCAGAGCAGGATCCACAAACTGG TRUE
## inIntron entrez_id gene score n.mismatch mismatch.distance2PAM
## 1 2623 GATA1 100 0 <NA>
## 2 2623 GATA1 100 0 <NA>
## 3 2623 GATA1 100 0 <NA>
## 4 2623 GATA1 100 0 <NA>
## alignment isCanonicalPAM forViewInUCSC strand chrom
## 1 .................... 1 chrX:48649577-48649599 - chrX
## 2 .................... 1 chrX:48649586-48649608 + chrX
## 3 .................... 1 chrX:48649587-48649609 + chrX
## 4 .................... 1 chrX:48649564-48649586 - chrX
## chromStart chromEnd extendedSequence gRNAefficacy
## 1 48649577 48649599 TTCTGGTGTGGAGGACACCAGAGCAGGATC 0.14209108
## 2 48649586 48649608 TCTGGTGTCCTCCACACCAGAATCAGGGGT 0.06962952
## 3 48649587 48649609 CTGGTGTCCTCCACACCAGAATCAGGGGTT 0.26188176
## 4 48649564 48649586 GACACCAGAGCAGGATCCACAAACTGGGGG 0.10171983
## flankSequence
## 1 GGATCTCCATGGCAACCCCAACAGCACTCAGCCAATGCCAAGACAGCCACTCAATGGAGTTACCTGGGGAGTGTCTGTAGGCCTCAGCGTCCCTGTAGTAGGCCAGTGCCGCAGCTGCAGCGGTGGCTGTGCTCGGGGCAGTGGAGGAAGCTGCTGCATCCAAGCCCTCAGGCCCAGAGGGGAAGAAAACCCCTGATTCTGGTGTGGAGGACACCAGAGCAGGATCCACAAACTGGGGGAGGGGCTCTGAGGTCCCCAGGGACCCCAGGCCAGGGAACTCCATGGAGCCTCTGGGGATTAACCTGCGAGGACAGAAGGGGTCCTCAGACACAGAAATCCTTCCACCCCACATCCTTTCACCTGCTCCTCTTCCTCCTTTCCCCCTCCTCCCACTCCATCACCTCAGTCTCCATATTTCTCCTT
## 2 TATGGAGACTGAGGTGATGGAGTGGGAGGAGGGGGAAAGGAGGAAGAGGAGCAGGTGAAAGGATGTGGGGTGGAAGGATTTCTGTGTCTGAGGACCCCTTCTGTCCTCGCAGGTTAATCCCCAGAGGCTCCATGGAGTTCCCTGGCCTGGGGTCCCTGGGGACCTCAGAGCCCCTCCCCCAGTTTGTGGATCCTGCTCTGGTGTCCTCCACACCAGAATCAGGGGTTTTCTTCCCCTCTGGGCCTGAGGGCTTGGATGCAGCAGCTTCCTCCACTGCCCCGAGCACAGCCACCGCTGCAGCTGCGGCACTGGCCTACTACAGGGACGCTGAGGCCTACAGACACTCCCCAGGTAACTCCATTGAGTGGCTGTCTTGGCATTGGCTGAGTGCTGTTGGGGTTGCCATGGAGATCCTTGGCTAGG
## 3 ATGGAGACTGAGGTGATGGAGTGGGAGGAGGGGGAAAGGAGGAAGAGGAGCAGGTGAAAGGATGTGGGGTGGAAGGATTTCTGTGTCTGAGGACCCCTTCTGTCCTCGCAGGTTAATCCCCAGAGGCTCCATGGAGTTCCCTGGCCTGGGGTCCCTGGGGACCTCAGAGCCCCTCCCCCAGTTTGTGGATCCTGCTCTGGTGTCCTCCACACCAGAATCAGGGGTTTTCTTCCCCTCTGGGCCTGAGGGCTTGGATGCAGCAGCTTCCTCCACTGCCCCGAGCACAGCCACCGCTGCAGCTGCGGCACTGGCCTACTACAGGGACGCTGAGGCCTACAGACACTCCCCAGGTAACTCCATTGAGTGGCTGTCTTGGCATTGGCTGAGTGCTGTTGGGGTTGCCATGGAGATCCTTGGCTAGGT
## 4 AACCCCAACAGCACTCAGCCAATGCCAAGACAGCCACTCAATGGAGTTACCTGGGGAGTGTCTGTAGGCCTCAGCGTCCCTGTAGTAGGCCAGTGCCGCAGCTGCAGCGGTGGCTGTGCTCGGGGCAGTGGAGGAAGCTGCTGCATCCAAGCCCTCAGGCCCAGAGGGGAAGAAAACCCCTGATTCTGGTGTGGAGGACACCAGAGCAGGATCCACAAACTGGGGGAGGGGCTCTGAGGTCCCCAGGGACCCCAGGCCAGGGAACTCCATGGAGCCTCTGGGGATTAACCTGCGAGGACAGAAGGGGTCCTCAGACACAGAAATCCTTCCACCCCACATCCTTTCACCTGCTCCTCTTCCTCCTTTCCCCCTCCTCCCACTCCATCACCTCAGTCTCCATATTTCTCCTTCCCACCTCCCCAT
##
## $gRNAs.bedFormat
## [,1] [,2] [,3] [,4] [,5]
## [1,] "chrX" "48649576" "48649599" "Hsap_GATA1_ex2_gR20r" "142.091080678942"
## [2,] "chrX" "48649585" "48649608" "Hsap_GATA1_ex2_gR39f" "69.6295234374132"
## [3,] "chrX" "48649586" "48649609" "Hsap_GATA1_ex2_gR40f" "261.881755628254"
## [4,] "chrX" "48649563" "48649586" "Hsap_GATA1_ex2_gR7r" "101.719829329578"
## [,6] [,7] [,8]
## [1,] "-" "48649581" "48649583"
## [2,] "+" "48649601" "48649603"
## [3,] "+" "48649602" "48649604"
## [4,] "-" "48649568" "48649570"
##
## $REcutDetails
## gRNAPlusPAM REcutgRNAName REname REpattern
## 2 GGTGTGGAGGACACCAGAGCAGG Hsap_GATA1_ex2_gR20r Aco12261II CCRGAG
## 3 GTGTCCTCCACACCAGAATCAGG Hsap_GATA1_ex2_gR39f BslI CCNNNNNNNGG
## 4 TGTCCTCCACACCAGAATCAGGG Hsap_GATA1_ex2_gR40f BslI CCNNNNNNNGG
## 5 CCAGAGCAGGATCCACAAACTGG Hsap_GATA1_ex2_gR7r BslI CCNNNNNNNGG
## 6 CCAGAGCAGGATCCACAAACTGG Hsap_GATA1_ex2_gR7r PflMI CCANNNNNTGG
## REcutStart REcutEnd
## 2 14 19
## 3 13 23
## 4 12 22
## 5 13 23
## 6 13 23
##
## $REs.isUnique100
## [1] "" "" "" "PflMI"
##
## $REs.isUnique50
## [1] "" "" "" "PflMI"
Calling the function offTargetAnalysis with findgRNAs = FALSE will skip the gRNA search step and go directly to off-target search, scoring and annotation for the input gRNAs. The input gRNAs will be annotated with restriction enzyme cut sites for users to review later. However, paired information will not be available.
gRNAFilePath <- system.file('extdata', 'testHsap_GATA1_ex2_gRNA1.fa', package = 'CRISPRseek') offTargetAnalysis(inputFilePath = gRNAFilePath, findPairedgRNAOnly = FALSE, findgRNAs = FALSE, BSgenomeName = Hsapiens, chromToSearch = 'chrX', txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, orgAnn = org.Hs.egSYMBOL, max.mismatch = 2, outputDir = outputDir, overwrite = TRUE)
## Validating input ...
## >>> Finding all hits in sequence chrX ...
## >>> DONE searching
## Building feature vectors for scoring ...
## Calculating scores ...
## Annotating, filtering and generating reports ...
## 403 genes were dropped because they have exons located on both strands
## of the same reference sequence or on more than one reference sequence,
## so cannot be represented by a single genomic range.
## Use 'single.strand.genes.only=FALSE' to get all the genes in a
## GRangesList object, or use suppressMessages() to suppress this message.
## Done annotating
## Add RE information...
## write gRNAs to bed file...
## Scan for REsites in flanking region...
## Done. Please check output files in directory /__w/CRISPRseekGUIDEseqBioc2020Workshop/CRISPRseekGUIDEseqBioc2020Workshop/vignettes/CRISPRseekDemo/
## $on.target
## names forViewInUCSC
## [1,] "Hsap_GATA1_ex2_gRNA1" "chrX:48649564-48649586"
## extendedSequence gRNAefficacy
## [1,] "TCCCCCAGTTTGTGGATCCTGCTCTGGTGT" "0.106547942216103"
##
## $summary
## names forViewInUCSC extendedSequence
## 1 Hsap_GATA1_ex2_gRNA1 chrX:48649564-48649586 TCCCCCAGTTTGTGGATCCTGCTCTGGTGT
## gRNAefficacy gRNAsPlusPAM top5OfftargetTotalScore
## 1 0.106547942216103 CCAGTTTGTGGATCCTGCTCNGG NA
## top10OfftargetTotalScore top1Hit.onTarget.MMdistance2PAM
## 1 NA NMM
## topOfftarget1MMdistance2PAM topOfftarget2MMdistance2PAM
## 1
## topOfftarget3MMdistance2PAM topOfftarget4MMdistance2PAM
## 1
## topOfftarget5MMdistance2PAM topOfftarget6MMdistance2PAM
## 1
## topOfftarget7MMdistance2PAM topOfftarget8MMdistance2PAM
## 1
## topOfftarget9MMdistance2PAM topOfftarget10MMdistance2PAM REname uniqREin200
## 1
## uniqREin100
## 1
##
## $offtarget
## name gRNAPlusPAM OffTargetSequence
## chrX Hsap_GATA1_ex2_gRNA1 CCAGTTTGTGGATCCTGCTCNGG CCAGTTTGTGGATCCTGCTCTGG
## inExon inIntron entrez_id gene score n.mismatch mismatch.distance2PAM
## chrX TRUE 2623 GATA1 100 0 <NA>
## alignment isCanonicalPAM forViewInUCSC strand chrom
## chrX .................... 1 chrX:48649564-48649586 + chrX
## chromStart chromEnd extendedSequence gRNAefficacy
## chrX 48649564 48649586 TCCCCCAGTTTGTGGATCCTGCTCTGGTGT 0.1065479
## flankSequence
## chrX ATGGGGAGGTGGGAAGGAGAAATATGGAGACTGAGGTGATGGAGTGGGAGGAGGGGGAAAGGAGGAAGAGGAGCAGGTGAAAGGATGTGGGGTGGAAGGATTTCTGTGTCTGAGGACCCCTTCTGTCCTCGCAGGTTAATCCCCAGAGGCTCCATGGAGTTCCCTGGCCTGGGGTCCCTGGGGACCTCAGAGCCCCTCCCCCAGTTTGTGGATCCTGCTCTGGTGTCCTCCACACCAGAATCAGGGGTTTTCTTCCCCTCTGGGCCTGAGGGCTTGGATGCAGCAGCTTCCTCCACTGCCCCGAGCACAGCCACCGCTGCAGCTGCGGCACTGGCCTACTACAGGGACGCTGAGGCCTACAGACACTCCCCAGGTAACTCCATTGAGTGGCTGTCTTGGCATTGGCTGAGTGCTGTTGGGGTT
##
## $gRNAs.bedFormat
## [,1] [,2] [,3] [,4] [,5]
## [1,] "chrX" "48649563" "48649586" "Hsap_GATA1_ex2_gRNA1" "106.547942216103"
## [,6] [,7] [,8]
## [1,] "+" "48649579" "48649581"
##
## $REcutDetails
## [1] gRNAPlusPAM REcutgRNAName REname REpattern REcutStart
## [6] REcutEnd
## <0 rows> (or 0-length row.names)
##
## $REs.isUnique100
## [1] ""
##
## $REs.isUnique50
## [1] ""
Calling the function offTargetAnalysis with chromToSearch = "" results in quick gRNA search without performing off-target analysis. Parameters findgRNAsWithREcutOnly and findPairedgRNAOnly can be tuned to indicate whether searching for gRNAs overlap restriction enzyme cut sites, and whether searching for gRNAs in paired configuration.
offTargetAnalysis(inputFilePath, chromToSearch = "", outputDir = outputDir, overwrite = TRUE)
## Validating input ...
## Searching for gRNAs ...
## Done. Please check output files in directory /__w/CRISPRseekGUIDEseqBioc2020Workshop/CRISPRseekGUIDEseqBioc2020Workshop/vignettes/CRISPRseekDemo/
## DNAStringSet object of length 5:
## width seq names
## [1] 23 CCAGTTTGTGGATCCTGCTCTGG Hsap_GATA1_ex2_gR17f
## [2] 23 GTGTCCTCCACACCAGAATCAGG Hsap_GATA1_ex2_gR39f
## [3] 23 TGTCCTCCACACCAGAATCAGGG Hsap_GATA1_ex2_gR40f
## [4] 23 GGTGTGGAGGACACCAGAGCAGG Hsap_GATA1_ex2_gR20r
## [5] 23 CCAGAGCAGGATCCACAAACTGG Hsap_GATA1_ex2_gR7r
Below is an example to search for gRNAs that target at least one of the alleles. Two files are provided containing sequences that differ by a single nucleotide polymorphism (SNP). The results are saved in file scoresFor2InputSequences.xls in outputDir directory.
Hungtinton disease is caused by mutations in the HTT gene. Expansion of CAG repeats in one copy of HTT can result in adult onset neurodegeneration. Because HTT is an essential gene, nucleases cannot be used that inactivate both alleles. Therefore, to identify nuclease target sites that are allele-specific, we will try to search for sites that overlap a single nucleotide polymorphism (SNP), RS362331 is located in a coding exon of HTT. Two sequences that differ only at the polymorphism site will be used as inputs for compare2sequences.
inputFile1Path <- system.file("extdata", "rs362331C.fa", package = "CRISPRseek") inputFile2Path <- system.file("extdata", "rs362331T.fa", package = "CRISPRseek") seqs <- compare2Sequences(inputFile1Path, inputFile2Path, outputDir = outputDir, overwrite = TRUE)
## search for gRNAs for input file1...
## Validating input ...
## Searching for gRNAs ...
## Done. Please check output files in directory /__w/CRISPRseekGUIDEseqBioc2020Workshop/CRISPRseekGUIDEseqBioc2020Workshop/vignettes/CRISPRseekDemo/rs362331C.fa-Jul-28-2020/
## search for gRNAs for input file2...
## Validating input ...
## Searching for gRNAs ...
## Done. Please check output files in directory /__w/CRISPRseekGUIDEseqBioc2020Workshop/CRISPRseekGUIDEseqBioc2020Workshop/vignettes/CRISPRseekDemo/rs362331T.fa-Jul-28-2020/
## [1] "Scoring ..."
## >>> Finding all hits in sequence rs362331T ...
## >>> DONE searching
## finish off-target search in sequence 2
## >>> Finding all hits in sequence rs362331C ...
## >>> DONE searching
## finish off-target search in sequence 1
## finish feature vector building
## finish score calculation
## [1] "Done!"
Calling the function offTargetAnalysis with max.mismatch = 0 results in quick gRNA search with gRNA efficacy prediction without off-target analysis.
inputFilePath <- system.file('extdata', 'inputseq.fa', package = 'CRISPRseek') results <- offTargetAnalysis(inputFilePath, annotateExon = FALSE,chromToSearch = "chrX", max.mismatch = 0, BSgenomeName = Hsapiens, outputDir = outputDir, overwrite = TRUE)
## Validating input ...
## Searching for gRNAs ...
## >>> Finding all hits in sequence chrX ...
## >>> DONE searching
## Building feature vectors for scoring ...
## Calculating scores ...
## Annotating, filtering and generating reports ...
## Done annotating
## Add paired information...
## Add RE information...
## write gRNAs to bed file...
## Scan for REsites in flanking region...
## Done. Please check output files in directory /__w/CRISPRseekGUIDEseqBioc2020Workshop/CRISPRseekGUIDEseqBioc2020Workshop/vignettes/CRISPRseekDemo/
Alternatively, you can set useEfficacyFromInputSeq = TRUE and chromToSearch = "" without specifying BSgenomeName if input sequence is long enough.
inputFilePath <- system.file('extdata', 'inputseq.fa', package = 'CRISPRseek') offTargetAnalysis(inputFilePath, annotateExon = FALSE,chromToSearch = "", useEfficacyFromInputSeq = TRUE, max.mismatch = 0, outputDir = outputDir, overwrite = TRUE)
## Validating input ...
## Searching for gRNAs ...
## Done. Please check output files in directory /__w/CRISPRseekGUIDEseqBioc2020Workshop/CRISPRseekGUIDEseqBioc2020Workshop/vignettes/CRISPRseekDemo/
## DNAStringSet object of length 5:
## width seq names
## [1] 23 CCAGTTTGTGGATCCTGCTCTGG Hsap_GATA1_ex2_gR17f
## [2] 23 GTGTCCTCCACACCAGAATCAGG Hsap_GATA1_ex2_gR39f
## [3] 23 TGTCCTCCACACCAGAATCAGGG Hsap_GATA1_ex2_gR40f
## [4] 23 GGTGTGGAGGACACCAGAGCAGG Hsap_GATA1_ex2_gR20r
## [5] 23 CCAGAGCAGGATCCACAAACTGG Hsap_GATA1_ex2_gR7r
Calling the function offTargetAnalysis with annotatePaired = FALSE, enable.multicore = TRUE and set n.cores.max will improve the performance. We also suggest split the super long sequence into smaller chunks and perform offTarget analysis for each subsequence separately (Thank Alex Williams for sharing this use case at https://support.bioconductor.org/p/72994/). In addition, please remember to use repeat masked sequence as input.
results <- offTargetAnalysis(inputFilePath, annotatePaired = FALSE, chromToSearch = "chrX", enable.multicore = TRUE, n.cores.max = 6, annotateExon = FALSE, max.mismatch = 0, BSgenomeName = Hsapiens, outputDir = outputDir, overwrite = TRUE)
## Validating input ...
## Searching for gRNAs ...
## >>> Finding all hits in sequence chrX ...
## >>> DONE searching
## Building feature vectors for scoring ...
## Calculating scores ...
## Annotating, filtering and generating reports ...
## Done annotating
## Add RE information...
## write gRNAs to bed file...
## Scan for REsites in flanking region...
## Done. Please check output files in directory /__w/CRISPRseekGUIDEseqBioc2020Workshop/CRISPRseekGUIDEseqBioc2020Workshop/vignettes/CRISPRseekDemo/
Calling the function offTargetAnalysis with scoring.method set to CFDscore will output CFD score using the algorithm by Doench et al., 2016, which models the effects of both mismatch position and mismatch type on cutting frequency. By default, scoring.method is set to Hsu-Zhang, which only models the effect of mismatch position.
results <- offTargetAnalysis(inputFilePath, annotatePaired = FALSE, scoring.method = "CFDscore", chromToSearch = "chrX", annotateExon = FALSE, max.mismatch = 2, BSgenomeName = Hsapiens, outputDir = outputDir, overwrite = TRUE)
## Validating input ...
## Searching for gRNAs ...
## >>> Finding all hits in sequence chrX ...
## >>> DONE searching
## Building feature vectors for scoring ...
## Calculating scores ...
## Annotating, filtering and generating reports ...
## Done annotating
## Add RE information...
## write gRNAs to bed file...
## Scan for REsites in flanking region...
## Done. Please check output files in directory /__w/CRISPRseekGUIDEseqBioc2020Workshop/CRISPRseekGUIDEseqBioc2020Workshop/vignettes/CRISPRseekDemo/
results <- offTargetAnalysis(inputFilePath, annotatePaired = FALSE, BSgenomeName = Hsapiens, chromToSearch = "chrX", txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, orgAnn = org.Hs.egSYMBOL, max.mismatch = 4, outputDir = outputDir, overwrite = TRUE, PAM.location = "5prime", PAM = "TGT", PAM.pattern = "^T[A|G]N", allowed.mismatch.PAM = 2, subPAM.position = c(1,2))
## Validating input ...
## Searching for gRNAs ...
## >>> Finding all hits in sequence chrX ...
## >>> DONE searching
## Building feature vectors for scoring ...
## Calculating scores ...
## Annotating, filtering and generating reports ...
## 403 genes were dropped because they have exons located on both strands
## of the same reference sequence or on more than one reference sequence,
## so cannot be represented by a single genomic range.
## Use 'single.strand.genes.only=FALSE' to get all the genes in a
## GRangesList object, or use suppressMessages() to suppress this message.
## Done annotating
## Add RE information...
## write gRNAs to bed file...
## Scan for REsites in flanking region...
## Done. Please check output files in directory /__w/CRISPRseekGUIDEseqBioc2020Workshop/CRISPRseekGUIDEseqBioc2020Workshop/vignettes/CRISPRseekDemo/
results <- offTargetAnalysis(inputFilePath, annotatePaired = FALSE, annotateExon = FALSE,findPairedgRNAOnly = FALSE, chromToSearch = "chrX", max.mismatch = 0, BSgenomeName = Hsapiens, rule.set = "CRISPRscan", baseBeforegRNA = 6, baseAfterPAM = 6, featureWeightMatrixFile = system.file("extdata", "Morenos-Mateo.csv", package = "CRISPRseek"), outputDir = outputDir, overwrite = TRUE)
## Validating input ...
## Searching for gRNAs ...
## >>> Finding all hits in sequence chrX ...
## >>> DONE searching
## Building feature vectors for scoring ...
## Calculating scores ...
## Annotating, filtering and generating reports ...
## Done annotating
## Add RE information...
## write gRNAs to bed file...
## Scan for REsites in flanking region...
## Done. Please check output files in directory /__w/CRISPRseekGUIDEseqBioc2020Workshop/CRISPRseekGUIDEseqBioc2020Workshop/vignettes/CRISPRseekDemo/
offTargetAnalysis(inputFilePath, findgRNAsWithREcutOnly = FALSE, REpatternFile = REpatternFile, findPairedgRNAOnly = FALSE, annotatePaired = FALSE, BSgenomeName = Hsapiens, chromToSearch = "chrX", txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, orgAnn = org.Hs.egSYMBOL, max.mismatch = 4, outputDir = outputDir, overwrite = TRUE, PAM.location = "5prime", PAM = "TGT", PAM.pattern = "^T[A|G]N", allowed.mismatch.PAM = 2, subPAM.position = c(1,2), baseEditing = TRUE, editingWindow = 10:20, targetBase = "A")
## Validating input ...
## Searching for gRNAs ...
## No gRNAs found in the input sequence Hsap_GATA1_ex2no gRNAs found!
inputFilePath <- DNAStringSet(paste( "CCAGTTTGTGGATCCTGCTCTGGTGTCCTCCACACCAGAATCAGGGATCGAAAA", "CTCATCAGTCGATGCGAGTCATCTAAATTCCGATCAATTTCACACTTTAAACG", sep ="")) names(inputFilePath) <- "testPE" results3 <- offTargetAnalysis(inputFilePath, gRNAoutputName = "testPEgRNAs", BSgenomeName = Hsapiens, chromToSearch = "chrX", txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, orgAnn = org.Hs.egSYMBOL, max.mismatch = 1, outputDir = outputDir, overwrite = TRUE, PAM.size = 3L, gRNA.size = 20L, overlap.gRNA.positions = c(17L,18L), PBS.length = 15, corrected.seq = "T", RT.template.pattern = "D$", RT.template.length = 8:30, targeted.seq.length.change = 0, bp.after.target.end = 15, target.start = 20, target.end = 20, paired.orientation = "PAMin", min.gap = 20, max.gap = 90, primeEditing = TRUE, findPairedgRNAOnly = TRUE)
## Validating input ...
## Searching for gRNAs ...
## >>> Finding all hits in sequence chrX ...
## >>> DONE searching
## Building feature vectors for scoring ...
## Calculating scores ...
## Annotating, filtering and generating reports ...
## 403 genes were dropped because they have exons located on both strands
## of the same reference sequence or on more than one reference sequence,
## so cannot be represented by a single genomic range.
## Use 'single.strand.genes.only=FALSE' to get all the genes in a
## GRangesList object, or use suppressMessages() to suppress this message.
## Done annotating
## Add paired information...
## Add RE information...
## write gRNAs to bed file...
## Scan for REsites in flanking region...
## Done. Please check output files in directory /__w/CRISPRseekGUIDEseqBioc2020Workshop/CRISPRseekGUIDEseqBioc2020Workshop/vignettes/CRISPRseekDemo/
To preferentially target one allele, select gRNAs that have the lowest score for the other allele. Selected gRNAs can then be examined for potential off-target cleavage as described in Scenario 5.
Identify gRNAs that target the following two input sequences equally well with minimized off-target cleavage
MfSerpAEx2
GACGATGGCATCCTCCGTTCCCTGGGGCCTCCTGCTGCTGGCGGGGCTGTGCTGCCTGGCCCCCCGCTCCCTGGCCTCGAGTCCCCTGGGAGCCGCTGTCCAGGACACAGGTGCACCCCACCACGACCATGAGCACCATGAGGAGCCAGCCTGCCACAAGATTGCCCCGAACCTGGCCGACTTCGCCTTCAGCATGTACCGCCAGGTGGCGCATGGGTCCAACACCACCAACATCTTCTTCTCCCCCGTGAGCATCGCGACCGCCTTTGCGTTGCTTTCTCTGGGGGCCAAGGGTGACACTCACTCCGAGATCATGAAGGGCCTTAGGTTCAACCTCACTGAGAGAGCCGAGGGTGAGGTCCACCAAGGCTTCCAGCAACTTCTCCGCACCCTCAACCACCCAGACAACCAGCTGCAGCTGACCACTGGCAATGGTCTCTTCATCGCTGAGGGCATGAAGCTACTGGATAAGTTTTTGGAGGATGTCAAGAACCTGTACCACTCAGAAGCCTTCTCCACCAATTTCGGGGACACCGAAGCAGCCAAGAAACAGATCAACGATTATGTTGAGAAGGGAACCCAAGGGAAAATTGTGGATTTGGTCAAAGACCTTGACAAAGACACAGCTTTCGCTCTGGTGAATTACATTTTCTTTAAAG
HsSerpAEx2
GACAATGCCGTCTTCTGTCTCGTGGGGCATCCTCCTGCTGGCAGGCCTGTGCTGCCTGGTCCCTGTCTCCCTGGCTGAGGATCCCCAGGGAGATGCTGCCCAGAAGACAGATACATCCCACCATGATCAGGATCACCCAACCTTCAACAAGATCACCCCCAACCTGGCTGAGTTCGCCTTCAGCCTATACCGCCAGCTGGCACACCAGTCCAACAGCACCAATATCTTCTTCTCCCCAGTGAGCATCGCTACAGCCTTTGCAATGCTCTCCCTGGGGACCAAGGCTGACACTCACGATGAAATCCTGGAGGGCCTGAATTTCAACCTCACGGAGATTCCGGAGGCTCAGATCCATGAAGGCTTCCAGGAACTCCTCCGTACCCTCAACCAGCCAGACAGCCAGCTCCAGCTGACCACCGGCAATGGCCTGTTCCTCAGCGAGGGCCTGAAGCTAGTGGATAAGTTTTTGGAGGATGTTAAAAAGTTGTACCACTCAGAAGCCTTCACTGTCAACTTCGGGGACACCGAAGAGGCCAAGAAACAGATCAACGATTACGTGGAGAAGGGTACTCAAGGGAAAATTGTGGATTTGGTCAAGGAGCTTGACAGAGACACAGTTTTTGCTCTGGTGAATTACATCTTCTTTAAAG
Constraint gRNA Sequence by setting gRNA.pattern to require or exclude specific features within the target site.
3a. Synthesis of gRNAs in vivo from host U6 promoters is more efficient if the first base is guanine. To maximize the efficiency, what can we set gRNA.pattern?
3b. Synthesis of gRNAs in vitro using T7 promoters is most efficient when the first two bases are GG. To maximize the efficiency, what can we set gRNA.pattern?
In the examples we went through, we deliberately restricted to search off-targets in chromosome X. If we are interested in genome-wide search, what needs to be changed and how?
Find gRNAs in a paired configuration with distance apart between 5 and 15 without performing off-target analysis
It is known that different CRISPR-cas system uses different PAM sequence, what parameter needs to be reset for PAM = ‘NNNNGGGT’?
It is known that different CRISPR-cas system has different gRNA length, what parameter needs to be reset?
Which parameter needs to be reset to what if we are interested in finding gRANs with restriction enzyme pattern of size 8 or above?
New penalty matrix has been recently derived, which parameter needs to be reset accordingly?
It has been shown that although PAM sequence NGG is preferred, a variant NAG is also recognized with less efficiency. The researcher is interested in performing off-target searching to include both NGG and NAG variants. What parameter(s) need to be set correctly to carry such a search?
Which parameter to reset if you would like to skip off-target analysis but still want the summary output file? How to reset the parameter?
How to perform offtarget analysis for genomes whose BSgenome is not available?
Hint: use compare2Sequences and reset searchDirection and findgRNAs
How to perform offtarget analysis for cpf1?
Hint: similar to scenario 11
You will need to alter parameters PAM.location, PAM, PAM.pattern and PAM.size