-
Notifications
You must be signed in to change notification settings - Fork 1
3. Quick start
This tutorial describes the basic usage of sgRNAble for the design of sgRNAs targeting a specific gene in a user-defined host genome.
This page also provides an overview of the sgRNAble arguments and input requirements, as well as a description of the output files.
sgRNAble requires the following two input files:
-
A
TARGET_SEQUENCE
file in.fasta
format, containing the sequence (or sequences) for which sgRNAs are being designed. This file can contain one or multiple sequences in FASTA format, each of which sgRNAble will select 10 optimal sgRNAs for (the number of sgRNAs per sequence can be varied using the-a AZIMUTH_CUTOFF
command). -
One or multiple
GENOME_SEQUENCE
files in.fasta
format, containing the sequence (or sequences) of the host genome. These files should include all chromosomal and extra-chromosomal sequences and must contain theTARGET_SEQUENCE
in order to predict off-target binding.
The following is the basic structure of a sgRNAble command, with optional arguments depicted in brackets [<optional-argument>]
sgrnable [-h] -t TARGET_SEQUENCE -g GENOME_SEQUENCE
[GENOME_SEQUENCE ...] [-a AZIMUTH_CUTOFF]
[-c COPY_NUMBER [COPY_NUMBER ...]] [-p PURPOSE]
[-o OUTPUT_DIR] [-th NUM_THREADS] [-m MAX_MEMORY] [-v]
The following required arguments identify the user-defined input files into sgRNAble:
Required arguments | Description |
---|---|
-t TARGET_SEQUENCE, --target_sequence TARGET_SEQUENCE |
TARGET_SEQUENCE in .fasta format. This file can include one or multiple sequence in FASTA format. |
-g GENOME_SEQUENCE [GENOME_SEQUENCE ...], --genome_sequence GENOME_SEQUENCE [GENOME_SEQUENCE ...] |
GENOME_SEQUENCE as one or multiple .fasta files. Multiple files can represent multiple chromosomes or extra-chromosomal elements. The TARGET_SEQUENCE must be included in these GENOME_SEQUENCE
|
The following optinal arguments allow users to adjust various parameters associated with sgRNA design:
Optional arguments | Description |
---|---|
-h, --help |
Shows help message in the command line and exits |
-a AZIMUTH_CUTOFF --azimuth_cutoff AZIMUTH_CUTOFF
|
Integer stating number of sgRNAs to be passed from Azimuth screening for subsequent investigation of off-target effects. sgRNAs are passed in descending order based on Azimuth on-target efficacy score. Off-target binding potential of all sgRNAs passed will be evaluated using the biophysical model of Cas9 binding mechanics and ranked based on output. |
-c COPY_NUMBER [COPY_NUMBER ...] --copy_number COPY_NUMBER [COPY_NUMBER ...]
|
Integer (or integers) stating copy number of GENOME_SEQUENCE files in the host genome. This is used in the prediction of off-target binding, where the copy number of an on- or off-target binding site will impact the likelihood of Cas9 being bound to the site in question. |
-p PURPOSE --purpose PURPOSE
|
Allows user to filter sgRNA sequences investigated based on intended application. d for default is the default argument and considers sgRNAs targeting both strands of the target sequence. i for interference considers only sgRNAs targeting the negative strand for use in CRSIPR interference. |
-o OUTPUT_DIR --output_dir OUTPUT_DIR
|
Allows user to define name of, and path to, output directory. |
-th NUM_THREADS --num_threads NUM_THREADS
|
Integer stating number of threads to be used when running sgRNAble. |
-m MAX_MEMORY --max_memory MAX_MEMORY
|
Integer stating maximum memory to be used by system in GB. Defaults to using all available memory in the system. |
-v --verbose
|
Enables verbose console logging. |
sgRNAble outputs a directory, named with the -o OUTPUT_DIR
argument, that contains three output files:
-
output.csv
- contains selected sgRNAs ranked based on Entropy Score in the following format:
Gene/ORF Name | Guide Sequence | Location in Gene | Strand | Entropy Score | Number of Exact Matches | Rank in Target Gene |
---|---|---|---|---|---|---|
GFP | GCTAGCTACTAGAGTCACAC | 29 | Positive | 9.221493193 | 1 | 1 |
GFP | CAAACTCAAGAAGGACCATG | 732 | Negative | 9.867165044 | 1 | 2 |
GFP | TGCTGGGATTACACATGGCA | 739 | Positive | 10.21981794 | 1 | 3 |
-
run.log
- contains a log of the sgRNAble command line output -
Run_Genome
- intermediate file containing the concatenated host genome used in the evaluation of off-target effects, in FASTA format
The following examples assuming sgRNAble has been installed following the instructions in the Installation page of the sgRNAble wiki, and that the user is in the appropriate environment.
Navigate to working directory in which sgRNAble can be run. Place input files in sub-directory named /data
(this directory structure is not necessary and relative or absolute paths can be used to select input or direct output anywhere on machine).
The following example describes the usage of sgRNAble to design sgRNAs targeting a chromosomal LacZ (
This example requires the following files that be downloaded from the sgRNAble GitHub repository in the sgRNAble/tests/data/
directory:
-
lacz.fasta
- target sequence -
ecoli_genome.fasta
- host genome
The following command will identify and rank 10 sgRNAs targeting the chromosomal copy of LacZ in an E. coli MG1655 host and provide output in a directory named lacz_output
. sgRNAble will use 4 threads and a miximum of 2GB of RAM.
sgrnable -t ./data/lacz.fasta \
-g ./data/ecoli_genome.fasta \
-o ./lacz_output \
-th 4 \
-m 2
Optional arguments can be used to specify additional sgRNA design criteria.
The following example describes the usage of sgRNAble to design sgRNAs targeting GFP encoded on a pSB1C3 vector backbone, in an E. coli MG1655 host.
This example requires the following files that be downloaded from the sgRNAble GitHub repository in the sgRNAble/tests/data/
directory:
-
gfp.fasta
- target sequence -
psb1c3-gfp.fasta
- target encoding vector -
ecoli_genome.fasta
- host genome
The following command will identify and rank 10 sgRNAs targeting the GFP encoded on a pSB1C3 vector, in an E. coli MG1655 host, and provide output in a directory named gfp_output
. sgRNAble will use 4 threads and a miximum of 2GB of RAM.
sgrnable -t ./data/gfp.fasta \
-g ./data/ecoli_genome.fasta ./data/psb1c3-gfp.fasta \
-o ./gfp_output \
-th 4 \
-m 2
Optional arguments can be used to specify additional sgRNA design criteria. For example, the argument -c 1 100
would indicate that a single copy of the first GENOME_SEQUENCE
listed (./data/ecoli_genome.fasta
) is present for every 100 copies of the second GENOME_SEQUENCE
listed (./data/psb1c3-gfp.fasta
).