Skip to content

Commit

Permalink
Merged in xengsort-sort-fix (pull request #198)
Browse files Browse the repository at this point in the history
Xengsort sort fix
  • Loading branch information
MikeWLloyd committed Jul 18, 2024
2 parents a3f717f + ac67e68 commit 8386555
Show file tree
Hide file tree
Showing 11 changed files with 26 additions and 17 deletions.
4 changes: 4 additions & 0 deletions ReleaseNotes.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# RELEASE NOTES

## Release 0.6.6

In this release, we add a FASTQ sorting function to the Xengsort module. Due to asynchronous multi-threading in the classification step, Xengsort produces FASTQ output with non-deterministic sort order. BWA produces subtly different mapping results when reads in otherwise identical FASTQ inputs are shuffled ([see note from BWA developer here](https://github.com/lh3/bwa/issues/192#issuecomment-380612006)). The slight mapping differences are not enough to impact overall results, but do prevent fully reproducible results when Xengsort is used and reads are not sorted. The addition of the sorting function allows for fully reproducible results, with no additional user action required.

## Release 0.6.5

In this minor release, we fix a `subscript out of bounds` bug in `bin/wes/sequenza_seg_na_window.R`.
Expand Down
2 changes: 0 additions & 2 deletions bin/help/somatic_wes.nf
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,6 @@ Parameter | Default | Description
--bait_picard | '/projects/omics_share/human/GRCh38/supporting_files/capture_kit_files/agilent/v7/S31285117_MergedProbes_no_gene_names.picard.interval_list' | A GATK interval file covering WES target intervals. Used in calculating coverage metrics. This file can be the same as the interval file, NOTE: This file MUST reflect the capture array used to generate your data.
--mismatch_penalty | -B 8 | The BWA penalty for a mismatch.
--call_val | 50 | The minimum phred-scaled confidence threshold at which variants should be called.
--ploidy_val | '-ploidy 2' | Sample ploidy
--gnomad_ref | '/projects/compsci/omics_share/human/GRCh38/genome/annotation/snps_indels/af-only-gnomad.hg38.vcf.gz' | GnomAD germline reference from GATK resource pack.
--pon_ref | '/projects/compsci/omics_share/human/GRCh38/genome/annotation/snps_indels/1000g_pon.hg38.vcf.gz' | 1000 genome germline panel of normals from GATK resource pack.
Expand Down
2 changes: 0 additions & 2 deletions bin/help/somatic_wes_pta.nf
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,6 @@ Parameter | Default | Description
--bait_picard | '/projects/omics_share/human/GRCh38/supporting_files/capture_kit_files/agilent/v7/S31285117_MergedProbes_no_gene_names.picard.interval_list' | A GATK interval file covering WES target intervals. Used in calculating coverage metrics. This file can be the same as the interval file, NOTE: This file MUST reflect the capture array used to generate your data.
--mismatch_penalty | -B 8 | The BWA penalty for a mismatch.
--call_val | 50 | The minimum phred-scaled confidence threshold at which variants should be called.
--ploidy_val | '-ploidy 2' | Sample ploidy
--gnomad_ref | '/projects/compsci/omics_share/human/GRCh38/genome/annotation/snps_indels/af-only-gnomad.hg38.vcf.gz' | GnomAD germline reference from GATK resource pack.
--pon_ref | '/projects/compsci/omics_share/human/GRCh38/genome/annotation/snps_indels/1000g_pon.hg38.vcf.gz' | 1000 genome germline panel of normals from GATK resource pack.
Expand Down
1 change: 0 additions & 1 deletion bin/log/somatic_wes.nf
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,6 @@ ______________________________________________________
--bait_picard ${params.bait_picard}
--snpEff_config ${params.snpEff_config}
--mismatch_penalty ${params.mismatch_penalty}
--call_val ${params.call_val}
--gen_ver ${params.gen_ver}
--gold_std_indels ${params.gold_std_indels}
--phase1_1000G ${params.phase1_1000G}
Expand Down
1 change: 0 additions & 1 deletion bin/log/somatic_wes_pta.nf
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,6 @@ ______________________________________________________
--bait_picard ${params.bait_picard}
--snpEff_config ${params.snpEff_config}
--mismatch_penalty ${params.mismatch_penalty}
--call_val ${params.call_val}
--gen_ver ${params.gen_ver}
--gold_std_indels ${params.gold_std_indels}
--phase1_1000G ${params.phase1_1000G}
Expand Down
3 changes: 1 addition & 2 deletions config/somatic_wes.config
Original file line number Diff line number Diff line change
Expand Up @@ -50,9 +50,8 @@ params {
target_picard = params.reference_cache+'/human/GRCh38/supporting_files/capture_kit_files/agilent/v7/S31285117_MergedProbes_no_gene_names.picard.interval_list'
bait_picard = params.reference_cache+'/human/GRCh38/supporting_files/capture_kit_files/agilent/v7/S31285117_MergedProbes_no_gene_names.picard.interval_list'

// Variant calling parameters
// BWA Param
mismatch_penalty = "-B 8"
call_val = "50.0"

gnomad_ref=params.reference_cache+'/human/GRCh38/genome/annotation/snps_indels/af-only-gnomad.hg38.vcf.gz'
pon_ref=params.reference_cache+'/human/GRCh38/genome/annotation/snps_indels/1000g_pon.hg38.vcf.gz'
Expand Down
1 change: 0 additions & 1 deletion config/somatic_wes_pta.config
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,6 @@ params {

// Variant calling parameters
mismatch_penalty = "-B 8"
call_val = "50.0"

gnomad_ref=params.reference_cache+'/human/GRCh38/genome/annotation/snps_indels/af-only-gnomad.hg38.vcf.gz'
pon_ref=params.reference_cache+'/human/GRCh38/genome/annotation/snps_indels/1000g_pon.hg38.vcf.gz'
Expand Down
6 changes: 4 additions & 2 deletions config/wes.config
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,9 @@ params {
target_picard = params.reference_cache+'/mouse/GRCm38/supporting_files/capture_kit_files/agilent/v2/S32371113_mouse_exon_V2.bare.picard.primary_assembly.interval_list'
bait_picard = params.reference_cache+'/mouse/GRCm38/supporting_files/capture_kit_files/agilent/v2/S32371113_mouse_exon_V2.bare.picard.primary_assembly.interval_list'

// Variant calling parameters
// BWA parameter
mismatch_penalty = "-B 8"
// Variant calling parameters
call_val = "50.0"
ploidy_val = "-ploidy 2"

Expand Down Expand Up @@ -75,8 +76,9 @@ if (params.gen_org=='human'){
params.target_picard = params.reference_cache+'/human/GRCh38/supporting_files/capture_kit_files/agilent/v7/S31285117_MergedProbes_no_gene_names.picard.interval_list'
params.bait_picard = params.reference_cache+'/human/GRCh38/supporting_files/capture_kit_files/agilent/v7/S31285117_MergedProbes_no_gene_names.picard.interval_list'

// Variant calling parameters
// BWA parameter
params.mismatch_penalty = "-B 8"
// Variant calling parameters
params.call_val = "50.0"
params.ploidy_val = "-ploidy 2"

Expand Down
6 changes: 4 additions & 2 deletions config/wgs.config
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,9 @@ params {
dbSNP_index = params.reference_cache+'/mouse/GRCm38/genome/annotation/snps_indels/GCA_000001635.6_current_ids.vcf.gz.tbi'
snpEff_config = params.reference_cache+'/mouse/GRCm38/genome/indices/snpEff_5_1/snpEff.config'

// Variant calling parameters
// BWA parameter
mismatch_penalty = "-B 8"
// Variant calling parameters
ploidy_val = "-ploidy 2"
call_val = "50.0"

Expand All @@ -67,8 +68,9 @@ if (params.gen_org=='human'){
params.chrom_contigs = params.reference_cache+'/human/GRCh38/genome/sequence/gatk/Homo_sapiens_assembly38.primaryChr.contig_list'
params.primary_chrom_bed = params.reference_cache+'/human/GRCh38/genome/annotation/intervals/Homo_sapiens_assembly38.primary_chrom.bed'

// Variant calling parameters
// BWA parameter
params.mismatch_penalty = "-B 8"
// Variant calling parameters
params.ploidy_val = "-ploidy 2"
params.call_val = "50.0"

Expand Down
13 changes: 11 additions & 2 deletions modules/xengsort/xengsort_classify.nf
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ process XENGSORT_CLASSIFY {
tuple val(sampleID), path(trimmed)

output:
tuple val(sampleID), path("fastq-graft.*.fq"), emit: xengsort_human_fastq
tuple val(sampleID), path("fastq-host.*.fq"), emit: xengsort_mouse_fastq
tuple val(sampleID), path("*fastq-graft_sorted.*.fq"), emit: xengsort_human_fastq
tuple val(sampleID), path("*fastq-host_sorted.*.fq"), emit: xengsort_mouse_fastq
tuple val(sampleID), path("*.txt"), emit: xengsort_log

script:
Expand All @@ -42,6 +42,9 @@ process XENGSORT_CLASSIFY {
--chunksize 32.0 \
--compression none &> ${sampleID}_xengsort_log.txt
cat fastq-host.1.fq | paste - - - - | sort -k1,1 -t " " | tr "\\t" "\\n" > ${sampleID}_fastq-host_sorted.1.fq
cat fastq-graft.1.fq | paste - - - - | sort -k1,1 -t " " | tr "\\t" "\\n" > ${sampleID}_fastq-graft_sorted.1.fq
"""

else if (params.read_type == "PE")
Expand All @@ -59,6 +62,12 @@ process XENGSORT_CLASSIFY {
--chunksize 32.0 \
--compression none &> ${sampleID}_xengsort_log.txt
cat fastq-host.1.fq | paste - - - - | sort -k1,1 -t " " | tr "\\t" "\\n" > ${sampleID}_fastq-host_sorted.1.fq
cat fastq-host.2.fq | paste - - - - | sort -k1,1 -t " " | tr "\\t" "\\n" > ${sampleID}_fastq-host_sorted.2.fq
cat fastq-graft.1.fq | paste - - - - | sort -k1,1 -t " " | tr "\\t" "\\n" > ${sampleID}_fastq-graft_sorted.1.fq
cat fastq-graft.2.fq | paste - - - - | sort -k1,1 -t " " | tr "\\t" "\\n" > ${sampleID}_fastq-graft_sorted.2.fq
"""

else error "${params.read_type} is invalid, specify either SE or PE"
Expand Down
4 changes: 2 additions & 2 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -45,13 +45,13 @@ manifest {
homePage = "https://github.com/TheJacksonLaboratory/cs-nf-pipelines"
mainScript = "main.nf"
nextflowVersion = "!>=22.04.3"
version = "0.6.4"
version = "0.6.6"
author = 'Michael Lloyd, Brian Sanderson, Barry Guglielmo, Sai Lek, Peter Fields, Harshpreet Chandok, Carolyn Paisie, Gabriel Rech, Ardian Ferraj, Anuj Srivastava. Copyright Jackson Laboratory 2024'
}

profiles {
sumner { includeConfig "config/profiles/sumner.config" }
sumner2 { includeConfig "config/profiles/sumner2.config" }
sumner2 { includeConfig "config/profiles/sumner2.config" }
elion { includeConfig "config/profiles/elion.config" }
}

Expand Down

0 comments on commit 8386555

Please sign in to comment.