-
Notifications
You must be signed in to change notification settings - Fork 1
2. Rapid Identification of New Instances of High Interest Segments
The other two main programs in ConSequences are generateReferenceMSA.py
and querySegmentInRawReads.py
which enable the quick prediction of whether a sample has a segment of interest directly from short read sequencing reads.
If a conserved segment of interest is identified from delineateSegmentsOnReference.py
based analysis. Result files generated by that program can be provided as input to generateReferenceMSA.py
to construct a reference-based multiple sequence alignment (MSA) for the segment.
Afterwards, the program querySegmentInRawReads.py
can be used to predict whether the defining/core components of the MSA are present in the raw reads of a sample (provided as FASTQ files) using a sliding k-mer analysis of one or multiple segment MSAs. As slight variations can exist between instances of a signature in the multiple sequence alignment, a sample only needed to possess one of the possible 31-mers.
usage: generateReferenceMSA.py [-h] -r REF_FASTA -s START_COORD -e END_COORD
-m MAPPING_SCAFFS -w SLIDING_WINDOW_RESULTS -o
MSA_OUTPUT [-l LOG_FILE]
Program: generateReferenceMSA.py
Author: Rauf Salamzade
The Broad Institute of MIT and Harvard
Earl Lab / Bacterial Genomics Group
This program will generate a . If facing difficulties, please raise
issues on the github page: https://github.com/broadinstitute/consequences
optional arguments:
-h, --help show this help message and exit
-r REF_FASTA, --ref_fasta REF_FASTA
FASTA for reference scaffold upon which
segment lies.
-s START_COORD, --start_coord START_COORD
Starting coordinate of segment.
-e END_COORD, --end_coord END_COORD
Ending coordinate of segment.
-m MAPPING_SCAFFS, --mapping_scaffs MAPPING_SCAFFS
List of scaffolds with segment. One per line.
-w SLIDING_WINDOW_RESULTS, --sliding_window_results SLIDING_WINDOW_RESULTS
Sliding window results file which contains
variant information.
-o MSA_OUTPUT, --msa_output MSA_OUTPUT
Multiple-sequence-alignment to be used for rapid
identification of signature sequences.
-l LOG_FILE, --log_file LOG_FILE
Path to logging output file
usage: querySegmentInRawReads.py [-h] -m REF_MSAS [REF_MSAS ...] -r REFERENCES
[REFERENCES ...] -i ILLUMINA_READS
[ILLUMINA_READS ...] -o OUTPUT_PREFIX
[-d MIN_DEPTH] [-k KMER_LENGTH] [-c CORES]
Program: generateReferenceMSA.py
Author: Rauf Salamzade
The Broad Institute of MIT and Harvard
Earl Lab / Bacterial Genomics Group
This program will generate a . If facing difficulties, please
raise issues on the github page: https://github.com/broadinstitute/consequences
optional arguments:
-h, --help show this help message and exit
-m REF_MSAS [REF_MSAS ...], --ref_msas REF_MSAS [REF_MSAS ...]
Multi-FASTA reference-based multiple sequence alignment(s)
for segment(s) of interest.
-r REFERENCES [REFERENCES ...], --references REFERENCES [REFERENCES ...]
Reference sample. Should be provided in same respective
order as --ref_msas.
-i ILLUMINA_READS [ILLUMINA_READS ...], --illumina_reads ILLUMINA_READS [ILLUMINA_READS ...]
Illumina or any high-accuracy sequencing data in FASTQ format.
-o OUTPUT_PREFIX, --output_prefix OUTPUT_PREFIX
Multiple-sequence-alignment to be used for rapid
identification of signature sequences.
-d MIN_DEPTH, --min_depth MIN_DEPTH
Minimum number of times k-mer has to occur in sample
read's to avoid inclusion of sequencing errors.
-k KMER_LENGTH, --kmer_length KMER_LENGTH
Size of k-mer to use for searching. Default is 31.
-c CORES, --cores CORES
Number of cores to provide JellyFish. Default is 1.