Releases: theiagen/public_health_viral_genomics
v1.5.1
Patches to address issues reported in Mercury Batch
This patch fixes the workflow failures associated with Mercury_Batch string evaluations for vadr_num_alert
input, e.g. Failed to evaluate input 'vadr_num_alerts' (reason 1 of 1): For input string: "VADR skipped due to poor assembly; assembly length (unambiguous) = 30
.
Other modifications:
- Titan genomics characterization CLI executable changed from
titan-gc-cli
totitan-gc
(this change has also been reflected in the CI checks) - Versioning sync with
titan-gc
and WDL versioning task
v1.5.0
Minor Release that Adds the Titan_FASTA Workflow and Titan-GC CLI Functionality
- Titan_FASTA accepts pre-assembled fasta inputs and performs assembly assessment (number N/ATCG/degnenerate/total & percent reference coverage) as well as both SC2 lineage and clade assignment.
- Titan-GC provides access to all Titan workflows for genomic characterization through a single CLI executable (
titan-gc-cli
); this resources has also enables continuous integration testing via GitHub actions
Other Workflow Modifications
- Primer bed files now required for Titan_ONT and Titan_ClearLabs workflows allowing the user to analyze samples that have been sequenced with alternative primer schemes (e.g. v4 or the ONT Midnight protocol)
- Mercury Batch has been modified to accept samples with or without the GISAID/GenBank submittable files
- Pangolin_Update now captures the changes in lineage assignments over time through the
pangolin_updates
andpango_lineage_log
outputs - Memory-increase-and-retry Terra feature has been enabled for every task within the PHVG repository
Task and Docker Image Updates
nextclade_one_sample
split into two tasks: text wrangling from the NextClade report now performed in thenextclade_output_parser_one_sample
task as new NextStrain docker image lacks Python- NextClade task will populate fields as
NA
if no report output is available--also avoids full-workflow failure in the cases of 0bp assemblies
- NextClade task will populate fields as
kraken2
task modified to remove--classified-out
flag to increase compute efficiencyvadr
task now skips samples with poor assemblies to avoid full-workflow failuremafft_cpu
now exposed as a user input parameter in Titan_Augur_Run workflow- Images for various software harmonized to, when appropriate, maintain the same software versions run across different tasks
- iVar tsv converted to VCF output file
- Human scrubber image pulled from NCBI's public GCP environment
- Default NextClade image set to
nextstrain/nextclade:1.2.3
- Default VADR image set to
staphb/vadr:1.3
; default params updated as per recommendations by NCBI - Default Pangolin image set to
staphb/pangolin:3.1.11-pangolearn-2021-08-09
v1.4.4
Patches to address the most commonly reported bugs in the Titan workflows for genomic characterization
- bedtools.cov task removed as it was not generating any informative results for public health laboratories and causing samples with large input read files to fail
- logic added to skip VADR for poor quality consensus genome assemblies (<10,000bp assembly length unambiguous) to avoid an exit status 1 that caused workflow runs to fail
- Harmonized output variables across all four workflows (
Titan_ClearLabs
,Titan_ONT
,Titan_Illumina_PE
,Titan_Illumina_SE
Pangolin v3 integration
- Pangolin task re-written to allow for full functionality of the Pangolin v3 software
- Inference engine set to UShER by default, but can be modified by the user with the
inference_engine
input variable (accepts eitherusher
orpangolearn
)
Mercury workflow outputs modified to ensure direct upload capabilities to GISAID
- Small formatting edits were made to ensure that the GISAID_upload_meta.csv file generated by the Mercury workflows can be uploaded directly to GISAID without the need of re-formatting as was previously required.
Updated Docker Images
- NextClade set to v0.14.4 and now includes WHO variant designations
- VADR set to v1.2.1 to reflect most stable release (no changes to underlying functionality, as per VADR developers)
Analysis date and PHVG version captured for every workflow
- Will be output to Terra table as {workflow}_analysis_date & {workflow}_version
v1.4.3
Adopt latest version of the NCBI SRA-Human-Scrubber tool that addresses previous issue regarding the inadvertent impact on the quality of SC2 genome assemblies generated with the Titan workflows for genomic characterization in v1.4.1
Updates to the ncbi_scrub_pe
and ncbi_scrub_se
tasks:
- NCBI Scrubber docker tag updated to latest release: ncbi/sra-human-scrubber:1.0.2021-05-05
-n
flag invoked withscrubber.sh
command to ensure that human reads are replaced with IUPAC N rather than fully removed from read files- Additional outputs to capture the number of human sports removed
Modified workflows:
ncbi_scrub_pe
andncbi_scrub_se
modified to use updated task and output number of human spots removedTitan_Illumina_PE
,Titan_ONT
&Titan_ClearLabs
modified to incorporate host-removal as first step; extra kraken run added after dehosting to check impact on the percentage of human & SC2 reads- New outputs for
Titan_ClearLabs
&Titan_ONT
:dehosted_reads
,fastqc_clean
,kraken_human_deshoted
,kraken_sc2_dehosted
,kraken_report_deshoted
- New outputs for
Titan_Illumina_PE
:dehosted_read1
,dehosted_read2
,kraken_human_deshoted
,kraken_sc2_dehosted
,kraken_report_deshoted
- Note: the updated NCBI scrubber tasks were not incorporated into the
Titan_Illumina_SE
workflow since the smaller size read data are not compatible with host-removal with this tool
- New outputs for
Other repo changes:
- Default pangolin tag for Titan workflows for genomic characterization set to staphb/pangolin:2.4.2-pangolearn-2021-05-19
- Updated default
minlength
for thetrimmomatic
task to 75- No workflows impacted as explicit minlengths are provided in the workflows that utilize this task
v1.4.2
Patch to properly parse modified pangolin outputs in versions >=2.4.1
Updates to the pangolin2 task and the workflows the use it (Titan_Illumina_PE, Titan_Illuimna_SE, Titan_ONT, Titan_ClearLabs, & Pangolin_Update):
- Parse outputs from repot file by column headers to avoid a need to make changes if additional fields are added in future releases (including pangolin and pangoLEARN versions)
- Output pangolin conflicts & notes at task and workflow level
- Remove pangolin_aLRT output as it is no longer available in pangolin versions >=2.4
- add min_length and max_ambig intputs (defaults set to 10000 and 0.5, respectively)
- Set pangolin_docker_image to most recent stable release available: staphb/pangolin:2.4.2-pangolearn-2021-04-28
Other repo changes:
- Cleaned local-dev workflows
- Ensure a static docker tag for nextclade
v1.4.1
Patch update:
- NCBI scrubber has been removed from the Titan workflows for genomic characterization (Titan_ClearLabs, Titan_ONT, Titan_Illumina_PE) to avoid the inadvertent removal of SC2 data from raw read files
- NCBI_Scrub_PE and NCBI_Scrub_SE workflows have been made available for those who wish to use the NCBI scrubber outside of the Titan workflows
v1.4.0
Minor Updates to Titan genomic characterization workflows (ONT, ClearLabs, Illumina PE/SE):
- Titan Illumina SE workflow release
- Pangolin tag updated to staphb/pangolin:2.3.8-pangolearn-2021-04-14
- VADR version updated to staphb/vadr:1.2 (required some code modification in addition to Docker tag update)
-- VADR Update added as single-module workflow - NCBI Srubber added to de-host SC2 read data (not included in Titan Illumina SE workflow as read data not compatible with this tool)
- Add FastQC output to Titan ClearLabs and ONT workflows
*** Important note regarding this release ***
Users have reported, and we have since confirmed, that the inclusion of the NCBI Scrubber tool (meant for removing human read data) in the Titan genomic characterization workflows was inadvertently impacting the quality of the downstream consensus SC2 assembly.
This module will be removed from the Titan workflows in a future release, but will be made available as a single-task workflow for those submitting to SRA.
v1.3.2
Bug Fix:
- Remove report generator that was causing Titan Augur Run failures
v1.3.1
Bug Fix:
- Fix issue to allow numerical sample IDs for Titan Augur workflows (i.e. update nextstrain docker)
v1.3.0
Minor update:
- Swapped seqyclean out for Trimmomatic & bbDuk for read cleaning and adapter/phix removal
- Include entire sample ID for GenBank metadata file