25 May 16:49

d75e99b

v2.3.2 Latest

Latest

PHVG v2.3.2 Patch Release

This patch release updates the Mercury workflows and adds a new output variable `ivar_variant_proportion_intermediate`

Mercury patches

This release adds the "covv_consortium" column to the output GISAID metadata file in the Mercury workflows. This new optional column has been added to the metadata formatters, which can be found here: Mercury_PE/_SE_Prep at gs://theiagen-public-files/terra/mercury-files/Terra_Metadata_Formatter_2023_05_22.xlsx, and Mercury_Prep_N_Batch at gs://theiagen-public-files/terra/mercury-files/Mercury_Prep_N_Batch_SC2_Metadata_Formatter_2023_05_22.xlsx.

Also, empty date values will now fail more informatively in Mercury_Prep_N_Batch.

New output variable

The variant_call task has been modified to now calculate the proportion of variants at intermediate allele frequencies (60-90%). This value is reported in the output column ivar_variant_proportion_intermediate for workflows that use iVar to perform variant calling (TheiaCoV_Illumina_PE and TheiaCoV_Illumina_SE).

What's Changed

Update README.md by @kevinlibuit in #218
Add "consortium" to Mercury_Prep by @sage-wright in #222
Add intermediate frequency mutations screen TheiaCoV by @michellescribner in #220
Mercury Fix when collection_date is missing by @sage-wright in #219
update checksums by @sage-wright in #223

Full Changelog: v2.3.1...v2.3.2

Contributors

kevinlibuit, sage-wright, and michellescribner

Assets 2

10 Mar 16:42

kapsakcj

v2.3.1

715123f

v2.3.1

PHVG v2.3.1 release notes

This patch release adds capability for detection of mutations known to be associated with Tamiflu resistance, includes bug fixes for Influenza Type B subtyping, and updates default input parameters (pangolin docker image, nextclade_dataset_tag, nextclade docker image).

New Features

New column tamiflu_resistance_aa_subs containing nextclade-detected substitutions that have been described in the literature to confer resistance to tamiflu (Influenza-specific)
New optional boolean input parameters for Mercury_Prep_N_Batch:
- using_clearlabs_data, using_reads_dehosted, usa_territory
New optional input parameter for Freyja_Plot workflow: mincov

Default Docker Images and Input Parameter Updates

Default pangolin docker image: staphb/pangolin:4.2-pdata-1.18.1.1
Default nextclade docker image: nextstrain/nextclade:2.11.0
Default nextclade_dataset_tag for SARS-CoV-2: 2023-02-25T12:00:00Z
Default freyja docker image: staphb/freyja:1.3.11

Other Changes

Bug fix: Type B Influenza subtypes no longer duplicated from ABRicate output
Updates to GitHub Actions workflows for automated testing

Documentation can be found here: https://theiagen.notion.site/Theiagen-Public-Health-Resources-a4bd134b0c5c4fe39870e21029a30566

What's Changed

Expose minimum coverage option in Freyja_Plot by @sage-wright in #211
Enable alternative read and assembly files by @sage-wright in #210
Fix Bug RE Type-B subtyping (TheiaCoV_Illumina PE flu track) by @kevinlibuit in #216
update nextclade TSV parsing for SC2: clade_legacy. Also update Flu & nextclade by @kapsakcj and @cimendes in #213
update default pangolin docker to staphb/pangolin:4.2-pdata-1.18.1.1 and nextclade_dataset_tag for SC2 by @kapsakcj in #217

Full Changelog: v2.3.0...v2.3.1

Follow Theiagen on Twitter & LinkedIn!

Contributors

kapsakcj, cimendes, and 2 other contributors

Assets 2

30 Dec 20:06

sage-wright

v2.3.0

e492dec

v2.3.0

PHVG v2.3.0 Release Notes

This minor release introduces updates organism updates for the TheiaCoV workflow series as well as a new workflow for preparing and submitting metadata to public repositories (Mercury_Prep_N_Batch).

Updates to the TheiaCoV Workflow Series

Organism track updates:

“MPXV” for monkeypox analysis: VADR annotation assessment enabled (was previously not supported)
"WNV" for West Nile Virus analysis: VADR annotation assessment enabled (was previously not supported)
"flu" for influenza analysis: will initiate genome assembly with IRMA and characterization with ABRicate against InsaFlu database and NextClade; available in TheiaCoV_Illumina_PE only
"HIV" for Human Immunodeficiency Virus analysis: will initiate consensus assembly by alignment (BWA + iVar or minimap2 + Medaka for Illumina and ONT read data, respectively) and characterization with Quasitools HyDRA for antiretroviral drug resistance detection

Note: The default value for the organism variable is “sars-cov-2”

QC and read processing modules updates:

Option to utilize fastp rather than trimmomatic for read processing
Reads processed by BBduk ordered reads help to ensure that downstream alignments are consistent

Mercury Prep-N-Batch Workflow

The Mercury_Prep_N_Batch workflow combines the previously separate Mercury_PE/SE_Prep and Mercury_Batch workflows into one.
This workflow functions as follows:

Step 1: Performs supermassive metadata wrangling (task sm_metadata_wrangling in task_mercury_file_wrangling)

downloads the entire origin Terra table where the data, analysis results, metadata, etc. are stored.
extracts the samples that the user intends to upload
creates some standard variables that are used multiple times (such as year, isolate, etc.)
determines which organism is being run (currently only supports sars-cov-2 and mpox) and sets the required and optional variables for each file that is being created (e.g., BioSample vs SRA vs GISAID vs GenBank/BankIt)
removes any entries that do not meet predetermined quality thresholds (vadr_num_alerts and number_N)
removes any entries that do not have all required fields present, and writes the samples that were removed to a table that also lists what fields were missing
renames columns as appropriate
reformats columns as appropriate
compiles all required and optional information in TSV files
renames files with the submission_id and edits fasta headers as appropriate
uploads read files to the Theiagen SRA GCP Google bucket

Step 2: If sars-cov-2, trim GenBank fasta files of terminal Ns (task trim_genbank_fastas in task_mercury_file_wrangling.wdl)

uses VADR to trim terminal ambiguous nucleotides
returns the edited fasta file

Step 3: If mpox, put metadata into sqn format (task table2asn in task_mercury_file_wrangling.wdl)

soft links the .sbt, .fsa, and .src files to have common name
converts the data into a sqn file with table2asn so it can be emailed to NCBI

New Documentation

Detailed documentation has been created for all workflows in the PHVG v2.3.0 repository.

What's Changed

citation.cff update by @kapsakcj in #172
New VADR output: .zip of output fasta files by @kapsakcj in #171
VADR updates for MPXV; update default nextclade_dataset_tag and docker by @kapsakcj in #175
Add optional arguments input to trimmomatic task and add fastp task by @michellescribner in #182
Rp3 add support for adapter files in bbduk, update ci test by @kapsakcj in #186
adds support for running VADR on WNV samples by @kapsakcj in #190
Azure compatibility by @sage-wright in #193
Adding flu organism track by @kevinlibuit in #194
Fja hiv merge dev by @frankambrosio3 in #198
the Mercury_Prep_N_Batch workflow by @sage-wright in #196
Fix conditional logic in Mercury Prep N Batch by @sage-wright in #199
Fix lowercase things by @sage-wright in #201
Ensure bbduk outputs are ordered by @kevinlibuit in #202
Update version and SC2 references by @kevinlibuit in #203
quality exclusion write out by @sage-wright in #204
Smw excluded mercury dev by @sage-wright in #205

New Contributors

@michellescribner made their first contribution in #182

Full Changelog: v2.2.0...v2.3.0

Follow Theiagen on Twitter!

Contributors

kapsakcj, kevinlibuit, and 3 other contributors

Assets 2

08 Aug 20:22

sage-wright

v2.2.0

ec5f1a9

v2.2.0

This release introduces TheiaCoV amenability to non-SARS-CoV-2 (e.g., MPXV) genomic characterization.

NOTE: Use of TheiaCoV for MPXV will require modified input variables; e.g., primer_bed and reference_genome. Please view our public Notion page for information on recommended input variables for MPXV genomic characterization.

Use of TheiaCoV for SARS-CoV-2 will not require any change to input variables; i.e., SARS-CoV-2 characterization is the default behavior of the TheiaCoV workflows. Please view our public Notion page to find the latest recommended workspace data elements for SARS-CoV-2 genomic characterization.

TheiaCoV amenability to non-SARS-CoV-2 genomic characterization

An organism variable has been implemented to indicate what organism you want to analyze. This is intended to allow for expansion of the workflow to other viruses not currently supported in the future.
- The default value is “sars-cov-2”
- Change to “MPXV” for monkeypox analysis
A new Boolean variable trim_primers indicates whether or not you want to trim primers. This is most applicable when analyzing data generated without primers; e.g., a metagenomic approach. Because of this change, the primer_bed variable is now optional and no longer will appear in the same location on the workflow input page. You must indicate a primer_bed file in order to trim primers. When you switch to this new version, the primer file will be inherited to the correct place so no change is required for SARS-CoV-2 users.
- The default value is true; primer trimming will occur unless indicated otherwise.
SC2-specific calculations have been moved to a new task so these calculations are performed only on SC2 samples, and output variables such as s_gene_percent_coverage are now prefaced by sc2_, for example sc2_s_gene_percent_coverage, in order to indicate this variable is specific for SC2.
VADR is only performed on SC2 samples.
- VADR is able to be run on MPXV samples but this release does not support this. Future releases will enable this feature.
Kraken2 has a new input variable target_org that enables the user to specify a target organism to pull from the Kraken2 report; e.g., if this value is set to "Monkeypox virus", the kraken_target_org percentage will populate with the percentage of MPXV identified in the sample.

New features

Updated documentation is now available on our readthedocs page
Pangolin:
- A new pango_lineage_expanded output variable has been created that is enabled by default through the expanded_lineage Boolean input variable. This output lists the pangolin lineage without any aliases (e.g., BA.5 → B.1.1.529.5)
- --skip-scorpio and --skip-designation-cache are now Boolean inputs that are defaulted to false.
Freyja:
- Two new workflows have been added: Freyja_Update, a workflow to create updated Freyja reference materials, and Freyja_Dash, a workflow to create an interactive HMTL visualization of aggregated Freyja demixed output
- The docker image has been updated to v1.3.10 for all Freyja tasks.
- New boolean inputs have been created to enable bootstrapping (bootstrap; default=false) and use of confirmed lineages only (confirmed_only; default=false)
- A new integer input indicating the number of bootstraps is only used when bootstrap is true (number_bootstraps)
- NOTE: Use of a dashboard configuration file is recommended for the Freyja_Dash workflow to create lineage groups and avoid “too many lineages” error messages. An example configuration file can be found here.
Nextclade:
- The Nextclade task has been modified to be compatible with versions ≥v2.0.0.
- The default dataset tag has been updated to 2022-07-26T12:00:00Z
- The default docker image has been updated to nextstrain/nextclade:2.4.0
- NOTE: In order to incorporate Nextclade v2.0.0, modifications were made that render our SARS-CoV-2 genomics characterization workflows (e.g., TheiaCoV_Illumina_PE) incompatible with older versions of Nextclade.

What's Changed

Update TheiaCoV workflows to utilize nextclade v2 by @kevinlibuit in #156
PHVG Read The Docs update by @emmadoughty and @michellescribner in #154
Add expanded-lineage output to pangolin4 task and associated workflows by @kevinlibuit in #157
Reorganize PHVG for MPXV by @sage-wright in #159
Freyja Updates by @kevinlibuit in #160
Update versions by @sage-wright in #161
Adds expanded to update by @sage-wright in #162
add miniwdl check workflow by @rpetit3 in #158
Capture Freyja Versions by @kevinlibuit in #164
Capture reads from alignment by @kevinlibuit in #165
fix genome length calc by @sage-wright in #166

New Contributors

@emmadoughty and @michellescribner made their first contributions in #154

Full Changelog: v2.1.2...v2.2.0

Contributors

rpetit3, kevinlibuit, and 3 other contributors

Assets 2

03 May 18:51

sage-wright

v2.1.2

acdbc23

v2.1.2

This patch release addresses an issue identified with the TheiaCoV_Augur_Prep workflow

Overambitious attempt at syntax standardization introduced a bug where wdl variables were not being written to TheiaCoV_Augur_Prep output metadata files; the syntax is now standardized and the bug is now squashed. 🐛👢

Other modifications

Updated default pangolin_docker_image (staphb/pangolin:4.0.6-pdata-1.8)
Updated default nextclade_dataset_tag (2022-04-28T12:00:00Z)

What's Changed

Fix PHVG v2.1.1 bug and update default images and tags by @sage-wright in #141

Full Changelog: v2.1.1...v2.1.2

Contributors

sage-wright

Assets 2

26 Apr 19:20

sage-wright

v2.1.1

2e366b6

v2.1.1

This patch release addresses issues identified with the TheiaCoV_Augur_Run workflows

CSV elements in metadata_merged now properly converted into CSV format
Multiple TheiaCoV_Augur_Run tasks modified to allow for graceful memory telemetry failure, described by @dpark01 here

Other Modifications:

Addition of the pangolin_arguments variable allows for additional user-defined arguments; e.g., --skip-scorpio

What's Changed

Smw fix mem dev by @sage-wright in #137
Enables --skip-scorpio functionality by @sage-wright in #139
Updated version by @sage-wright in #140

Full Changelog: v2.1.0...v2.1.1

Contributors

dpark01 and sage-wright

Assets 2

08 Apr 20:45

kapsakcj

v2.1.0

0926c09

v2.1.0

This minor release modifies the pangolin task to ensure compatibility with Pangolin ≥v4.0.4

NOTE: In order to incorporate Pangolin ≥v4.0.4, modifications were made that render our SARS-CoV-2 genomics characterization workflows (e.g. TheiaCoV_Illumina_PE) incompatible with older versions of Pangolin.

Default docker image for pangolin4 task set to: quay.io/staphb/pangolin:4.0.4-pdata-1.2.133

Other Modifications:

New Features

An s_gene_percent_coverage calculation was added to all Theia_COV workflows for SARS-CoV-2 genomic characterization that incorporate an alignment step (TheiaCoV_ClearLabs, TheiaCoV_Illumina_PE, TheiaCoV_Illumina_SE, and TheiaCoV_ONT).
- An additional TSV file is made that includes the percent coverage of all genes in SC2 genomes, assuming Wuhan-1 reference genome positions. It can be found under this column: percent_gene_coverage
A min_depth input variable was created for TheiaCoV_Illumina_PE and TheiaCoV_Illumina_SE workflows to specify the minimum depth of coverage required to call a base in the final assembly output and a variant in the VCF output.
- The default value for min_depth is 100.
- This parameter replaces min_depth parameter for two previous tasks consensus and variant_call. These variables have been consolidated.
The NextClade dataset tag used is now an output value generated in our SARS-CoV-2 genomics characterization workflows (e.g. TheiaCoV_Illumina_PE) under column: nextclade_ds_tag.
The TheiaCoV_Augur_Run merged_metadata output file is now in CSV format to be compatible with both Auspice and MicrobeTrace.

Default Docker Image Updates

Default Nextclade docker image updated to: nextstrain/nextclade:1.11.0
Default nextclade_dataset_tag updated to: 2022-03-31T12:00:00Z
Default Freyja docker image updated to: quay.io/staphb/freyja:1.3.2

Bug Fixes

The output of several Mercury files were called CSV files when they were actually TSV files. This is fixed. #112

Pull Requests and Resolved Issues

added sed line (#115) by @sage-wright in #118
update docs by @kevinlibuit in #120
Percent gene coverage calculations by @sage-wright in #126
converted derived_cols.tsv to a csv file by @sage-wright in #125
Various patches by @kevinlibuit in #127
pangolin v4 & updating nextclade defaults across 6 workflows by @kapsakcj in #128
Update to Freyja v1.3.4 by @kevinlibuit in #130
Cjk pangolin v4 dev by @kapsakcj in #131
update default pangolin docker image to 4.0.4 by @kapsakcj in #132
Update task_versioning.wdl by @kevinlibuit in #134

Full Changelog: v2.0.0...v2.1.0

Contributors

kapsakcj, kevinlibuit, and sage-wright

Assets 2

16 Feb 23:47

kevinlibuit

v2.0.0

a6df039

v2.0.0

This major release renames workflows to utilize the TheiaCoV tag (previously Titan) and adds five new workflows for public health viral genomics.

Workflow names changed and modifications made:

Titan_Augur_Prep → TheiaCoV_Augur_Prep
Titan_Augur_Run → TheiaCoV_Augur_Run
- Allow subsampling via user-defined builds.yml file
- Update default nextstrain docker images (nextstrain/base:build-20210127T135203Z → nextstrain/base:build-20210218T081251)
Titan_ClearLabs
- Update default consensus task docker container image (quay.io/staphb/artic-ncov2019:1.3.0 → quay.io/staphb/artic-ncov2019:1.3.0-medaka-1.4.3)
  - Note: quay.io/staphb/artic-ncov2019:1.3.0 & quay.io/staphb/artic-ncov2019-epi2me are both compatible alternative docker images
- Use of fastq-scan rather than fastqc to calculate number of reads and pairs
- Allow for use of a user-defined reference genome for consensus genome assembly
  - reference_genome consensus task input variable
Titan_Illumina_PE → TheiaCoV_Illumina_PE
- Default minimum coverage changed from 20x to 100x (ivar consensus and ivar variants tasks)
- Use of fastq-scan rather than fastqc to calculate number of reads and pairs
- Allow for use of a user-defined reference genome for consensus genome assembly
  - reference_genome workflow input variable
Titan_Illumina_SE → TheiaCoV_Illumina_SE
- Default minimum coverage changed from 20x to 100x (ivar consensus and ivar variants tasks)
- Use of fastq-scan rather than fastqc to calculate number of reads and pairs
- Allow for use of a user-defined reference genome for consensus genome assembly
  - reference_genome workflow input variable
Titan_ONT → TheiaCoV_ONT
- Update default consensus task docker container image (quay.io/staphb/artic-ncov2019:1.3.0-medaka-1.4.3 → quay.io/staphb/artic-ncov2019-epi2me)
  - Note: quay.io/staphb/artic-ncov2019:1.3.0 & quay.io/staphb/artic-ncov2019:1.3.0-medaka-1.4.3 are both compatible alternative docker images
- Use of fastq-scan rather than fastqc to calculate number of reads and pairs
- Allow for use of a user-defined reference genome for consensus genome assembly
  - reference_genome consensus task input variable
Titan_FASTA → TheiaCoV_FASTA
Titan-GC → TheiaCoV-GC

Workflows Added:

TheiaCoV_Validate
- Workflow that allows for the rapid comparison of critical output values generated by differing versions of TheiaCoV workflows for SARS-CoV-2 genomic characterization for bioinformatics validation purposes
TheiaCoV_DistanceTree
- Workflow that allows for Augur distance trees to be generated without refinement
Workflows for SARS-CoV-2 Wastewater Data Analysis
- Freyja_FASTQ
  - Workflow that allows running of the Freyja software with raw paired-end fastq files
    - This workflow will generate the required alignment that is used as input to the freya variants command that is then analyzed with freyja demix
- Freyja_Plot
  - Workflow to visualize Freyja outputs using the freyja plot command
- TheiaCoV_WWVC
  - Workflow for waste water variant calling that incorporates a modified version of the CDPHE's WasteWaterVariantCalling WDL Worfklow

Other modifications:

Default docker images updated for Pangolin (staphb/pangolin:3.1.11-pangolearn-2021-08-24 → quay.io/staphb/3.1.20-pangolearn-2022-02-02), VADR (staphb/vadr:1.3 → quay.io/staphb/1.4.1-models-1.3-2) and Nextclade (nextstrain/nextclade:1.3.0 → nextstrain/nextclade:1.10.3) and Nextclade dataset tag ( 2021-06-25T00:00:00Z → 2022-02-07T12:00:00Z) in all TheiaCOV workflows for SARS-CoV-2 genomic characterization (TheiaCoV_ClearLabs, TheiaCoV_FASTA, TheiaCoV_Illumina_PE, TheiaCoV_Illumina_SE, and TheiaCoV_ONT)
- NOTE: In order to incorporate Nextclade ≥v1.10.0, modifications to the nextclade_one_sample were made that render it incompatible with older versions of Nextclade.
Inclusion of S-gene coverage calculation in all Theia_COV workflows for SARS-CoV-2 genomic characterization that incorporate an alignment step (TheiaCoV_ClearLabs, TheiaCoV_Illumina_PE, TheiaCoV_Illumina_SE, and TheiaCoV_ONT)
Mercury_Batch requiring Array[String] (i.e. gcp_uri) for sra_reads input (was Array[File]); this change avoids the need for localization into VM before transferring to transfer bucket for SRA read submission drastically decreasing runtime
- This modifications means that a zipped file of reads for web portal submission is no longer produced if a gcp_bucket is not specified; instead, users are encouraged to utilize the zip_column_content workflow from the Theiagen Terra_Utilities repository to generate these files.
Implementation of a repository style guide

Assets 2

15 Sep 03:08

kevinlibuit

v1.5.3

87ca249

v1.5.3

Patch to address vulnerability in Mercury Prep workflows to the inadvertent removal of internal Ns when preparing assemblies for GenBank submission
This patch replaces the sed one liner that removed leading N's from assembly files in preparation for GenBank submission with the NCBI fasta-trim-terminal-ambigs.pl script as the sed solution was found to be vulnerable to inadvertent removal of non-terminal Ns in multi-line assembly files.

Other modifications made

NextClade default image updated to v1.3.0; nextclade_one_sample task modified to accommodate changes in sourcing reference files
GISAID metadata passage_history field auto-populated as original in the Mercury Prep workflows--other required fields (patient_age, patient_gender, and patient_status) populated as unknown if no input value is provided

Assets 2

04 Sep 00:59

kevinlibuit

v1.5.2

b56ddc4

v1.5.2

Minor release to update the Mercury Workflows
The Mercury workflows (Mercury_PE_Prep, Mercury_SE_Prep, and Mercury_Batch) have been updated to enable the inclusion of all required and suggested metadata as per the PHA4GE SARS-CoV-2 Contextual Data Specifications.

In addition to the submittable files to GISAID and GenBank, the Mercury workflows to prepare files for both BioSample registration, SRA submission. A protocol to utilize these new workflows for SC2 data submission has been made publicly available on Protocols.io.

Other modifications made

Pangolin task modified to capture all software and reference versions; outputs have changed accordingly:
-- pangolin_version: deprecated
-- pangolin_usher_version: deprecated
-- pangolin_versions: all pangolin software and reference data versions
-- pangolin_assignment_version: version captured from the final pangolin report, i.e. version of inference approach utilized to make the final pango lineage assignment
Titan workflows for genomic characterization modified to remove the pangolin_docker_image input parameter
-- The pangolin_docker_image is now an optional input parameter for the pangolin3 task titled docker
-- The default value for the pangolin3.docker input parameter has been set to staphb/pangolin:3.1.11-pangolearn-2021-08-24
nextclade_one_sample task modified to allow processing of 0bp assembly files (PR by @HNH0303 #64)
titan_augur_run workflow modified to address bug regarding processing of unmasked inputs (PR by @dpark01 #62)

Contributors

dpark01 and DOH-HNH0303

Assets 2

Releases: theiagen/public_health_viral_genomics

v2.3.2

PHVG v2.3.2 Patch Release

This patch release updates the Mercury workflows and adds a new output variable ivar_variant_proportion_intermediate

Mercury patches

New output variable

What's Changed

Contributors

v2.3.1

PHVG v2.3.1 release notes

New Features

Default Docker Images and Input Parameter Updates

Other Changes

What's Changed

Contributors

v2.3.0

PHVG v2.3.0 Release Notes

Updates to the TheiaCoV Workflow Series

Mercury Prep-N-Batch Workflow

New Documentation

What's Changed

New Contributors

Contributors

v2.2.0

This release introduces TheiaCoV amenability to non-SARS-CoV-2 (e.g., MPXV) genomic characterization.

TheiaCoV amenability to non-SARS-CoV-2 genomic characterization

New features

What's Changed

New Contributors

Contributors

v2.1.2

This patch release addresses an issue identified with the TheiaCoV_Augur_Prep workflow

Other modifications

What's Changed

Contributors

v2.1.1

This patch release addresses issues identified with the TheiaCoV_Augur_Run workflows

Other Modifications:

What's Changed

Contributors

v2.1.0

This minor release modifies the pangolin task to ensure compatibility with Pangolin ≥v4.0.4

Other Modifications:

New Features

Default Docker Image Updates

Bug Fixes

Pull Requests and Resolved Issues

Contributors

v2.0.0

This major release renames workflows to utilize the TheiaCoV tag (previously Titan) and adds five new workflows for public health viral genomics.

v1.5.3

v1.5.2

Contributors

This patch release updates the Mercury workflows and adds a new output variable `ivar_variant_proportion_intermediate`