An analysis pipeline designed to obtain raw data and quality control files from basespace cli
, leverage features of the the staphb-wf cecret
workflow, generate QC reports, upload samples to GISAID via gisaid cli
and prepare samples for upload to NBCI.
The pipeline workflow is as follows:
Deployment of the pipeline requires access to the AWS instance where features are stored, including staphb-wf cecret
, basespace cli
, and gisaid cli
.
This pipeline performs the following steps:
- INITIALIZE *
- Creates output project directory, if it doesn't exist.
- Copies configuration files needed for pipeline execution.
- CECRET *
- Downloads analysis files for processing directly from
BASESPACE
- Creates sample batches dependent on project size and input from config_pipeline.yaml
- Processes batches individually, including:
3a. Downloads raw data (FASTA) and quality control files from
BASESPACE
3b. Submits batch tostaphb-wf cecret
workflow 3c. Transforms results into batch level analysis and quality control reports - Merges batch outputs into final analysis and quality control reports
- Removes intermediate files, and working directories
- GISAID *
- Perform QC for samples that fail N threshold, files added to failed list
- Perform QC for samples missing metadata files, files added to failed list
- Transforms passing samples metadata into GISAID required template
- Transforms passing samples FASTA files into GISAID required FASTA
- Uploaded metadata and FASTA files to GISAID
- Merge GISAID ID's into final output report
- Moves FASTA files to appropriate final directories (IE gisaid_complete)
- NCBI *
- Prepares NCBI Attributes batch file
- Prepares NCBI Metadata batch file
- Downloads FASTQ files from BASESPACE
- Return NCBI ID's are added to final output, QC information is tracked
- STATS *
- Outputs stats from QC, GISAID, and NCBI uploads to command line
Review the UserGuiden documentation for more help!
Usage: -p [REQUIRED] pipeline runmode
-p options: init, sarscov2, gisaid, ncbi, stat, update
Usage: -n [REQUIRED] project_id
-n project id
Usage: -s [OPTIONAL] subworkflow options
-s sarscov2: DOWNLOAD, BATCH, CECRET, REPORT, ALL; gisaid: PREP, UPLOAD, QC, ALL
Usage: -r [OPTIONAL] resume options
-r Y,N option to resume `-p sarscov2` workflow in progress
Usage: -t [OPTIONAL] testing options
-t Y,N option to run test in `-p sarscov2` workflow
This pipeline was created by Samantha Sevilla Chill, for support of work at the Ohio Department of Health Public Laboratory.