Skip to content

3. Quick start

Noonanav edited this page Nov 15, 2021 · 5 revisions

Tutorial on standard sgRNA design

Overview

This tutorial describes the basic usage of sgRNAble for the design of sgRNAs targeting a specific gene in a user-defined host genome.

This page also provides an overview of the sgRNAble arguments and input requirements, as well as a description of the output files.

Table of Contents

Input

sgRNAble requires the following two input files:

  1. A TARGET_SEQUENCE file in .fasta format, containing the sequence (or sequences) for which sgRNAs are being designed. This file can contain one or multiple sequences in FASTA format, each of which sgRNAble will select 10 optimal sgRNAs for (the number of sgRNAs per sequence can be varied using the -a AZIMUTH_CUTOFF command).

  2. One or multiple GENOME_SEQUENCE files in .fasta format, containing the sequence (or sequences) of the host genome. These files should include all chromosomal and extra-chromosomal sequences and must contain the TARGET_SEQUENCE in order to predict off-target binding.

Basic command

The following is the basic structure of a sgRNAble command, with optional arguments depicted in brackets [<optional-argument>]

sgrnable [-h] -t TARGET_SEQUENCE -g GENOME_SEQUENCE
         [GENOME_SEQUENCE ...] [-a AZIMUTH_CUTOFF]
         [-c COPY_NUMBER [COPY_NUMBER ...]] [-p PURPOSE]
         [-o OUTPUT_DIR] [-th NUM_THREADS] [-m MAX_MEMORY] [-v]

Argument descriptions

The following required arguments identify the user-defined input files into sgRNAble:

Required arguments Description
-t TARGET_SEQUENCE, --target_sequence TARGET_SEQUENCE TARGET_SEQUENCE in .fasta format. This file can include one or multiple sequence in FASTA format.
-g GENOME_SEQUENCE [GENOME_SEQUENCE ...], --genome_sequence GENOME_SEQUENCE [GENOME_SEQUENCE ...] GENOME_SEQUENCE as one or multiple .fasta files. Multiple files can represent multiple chromosomes or extra-chromosomal elements. The TARGET_SEQUENCE must be included in these GENOME_SEQUENCE

The following optinal arguments allow users to adjust various parameters associated with sgRNA design:

Optional arguments Description
-h, --help Shows help message in the command line and exits
-a AZIMUTH_CUTOFF
--azimuth_cutoff AZIMUTH_CUTOFF
Integer stating number of sgRNAs to be passed from Azimuth screening for subsequent investigation of off-target effects. sgRNAs are passed in descending order based on Azimuth on-target efficacy score. Off-target binding potential of all sgRNAs passed will be evaluated using the biophysical model of Cas9 binding mechanics and ranked based on output.
-c COPY_NUMBER [COPY_NUMBER ...]
--copy_number COPY_NUMBER [COPY_NUMBER ...]
Integer (or integers) stating copy number of GENOME_SEQUENCE files in the host genome. This is used in the prediction of off-target binding, where the copy number of an on- or off-target binding site will impact the likelihood of Cas9 being bound to the site in question.
-p PURPOSE
--purpose PURPOSE
Allows user to filter sgRNA sequences investigated based on intended application. d for default is the default argument and considers sgRNAs targeting both strands of the target sequence. i for interference considers only sgRNAs targeting the negative strand for use in CRSIPR interference.
-o OUTPUT_DIR
--output_dir OUTPUT_DIR
Allows user to define name of, and path to, output directory.
-th NUM_THREADS
--num_threads NUM_THREADS
Integer stating number of threads to be used when running sgRNAble.
-m MAX_MEMORY
--max_memory MAX_MEMORY
Integer stating maximum memory to be used by system in GB. Defaults to using all available memory in the system.
-v
--verbose
Enables verbose console logging.

Output

sgRNAble outputs a directory, named with the -o OUTPUT_DIR argument, that contains three output files:

  1. output.csv - contains selected sgRNAs ranked based on Entropy Score in the following format:
Gene/ORF Name Guide Sequence Location in Gene Strand Entropy Score Number of Exact Matches Rank in Target Gene
GFP GCTAGCTACTAGAGTCACAC 29 Positive 9.221493193 1 1
GFP CAAACTCAAGAAGGACCATG 732 Negative 9.867165044 1 2
GFP TGCTGGGATTACACATGGCA 739 Positive 10.21981794 1 3
  1. run.log - contains a log of the sgRNAble command line output

  2. Run_Genome - intermediate file containing the concatenated host genome used in the evaluation of off-target effects, in FASTA format

Examples

The following examples assuming sgRNAble has been installed following the instructions in the Installation page of the sgRNAble wiki, and that the user is in the appropriate environment.

Setup

Navigate to working directory in which sgRNAble can be run. Place input files in sub-directory named /data (this directory structure is not necessary and relative or absolute paths can be used to select input or direct output anywhere on machine).

Example #1 - Targeting chromosomal genes:

The following example describes the usage of sgRNAble to design sgRNAs targeting a chromosomal LacZ ($\beta$-galactosidase) in an E. coli MG1655 host.

Input Files

This example requires the following files that be downloaded from the sgRNAble GitHub repository in the sgRNAble/tests/data/ directory:

  • lacz.fasta - target sequence
  • ecoli_genome.fasta - host genome
sgRNAble Command

The following command will identify and rank 10 sgRNAs targeting the chromosomal copy of LacZ in an E. coli MG1655 host and provide output in a directory named lacz_output. sgRNAble will use 4 threads and a miximum of 2GB of RAM.

sgrnable -t ./data/lacz.fasta \
         -g ./data/ecoli_genome.fasta \
         -o ./lacz_output \
         -th 4 \
         -m 2

Optional arguments can be used to specify additional sgRNA design criteria.

Example #2 - Targeting exogenous genes:

The following example describes the usage of sgRNAble to design sgRNAs targeting GFP encoded on a pSB1C3 vector backbone, in an E. coli MG1655 host.

Input Files

This example requires the following files that be downloaded from the sgRNAble GitHub repository in the sgRNAble/tests/data/ directory:

  • gfp.fasta - target sequence
  • psb1c3-gfp.fasta - target encoding vector
  • ecoli_genome.fasta - host genome
sgRNAble Command

The following command will identify and rank 10 sgRNAs targeting the GFP encoded on a pSB1C3 vector, in an E. coli MG1655 host, and provide output in a directory named gfp_output. sgRNAble will use 4 threads and a miximum of 2GB of RAM.

sgrnable -t ./data/gfp.fasta \
         -g ./data/ecoli_genome.fasta ./data/psb1c3-gfp.fasta \
         -o ./gfp_output \
         -th 4 \
         -m 2

Optional arguments can be used to specify additional sgRNA design criteria. For example, the argument -c 1 100 would indicate that a single copy of the first GENOME_SEQUENCE listed (./data/ecoli_genome.fasta) is present for every 100 copies of the second GENOME_SEQUENCE listed (./data/psb1c3-gfp.fasta).

back to top