Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nf-test quantify pseudoalignment #1246

Merged
merged 13 commits into from
Mar 12, 2024
Merged
2 changes: 1 addition & 1 deletion modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -226,7 +226,7 @@
},
"summarizedexperiment/summarizedexperiment": {
"branch": "master",
"git_sha": "874dace043f1400fddca70dc9786fa4e82e6f5ac",
"git_sha": "92e403d44bee2574c7f4808e18c3b3efbe4fdb06",
"installed_by": ["modules"]
},
"trimgalore": {
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion subworkflows/local/prepare_genome/nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,8 @@ process {
withName: 'SALMON_INDEX' {
ext.args = { [
params.gencode ? '--gencode' : '',
params.pseudo_aligner_kmer_size ? "-k ${params.pseudo_aligner_kmer_size}": ''
params.pseudo_aligner_kmer_size ? "-k ${params.pseudo_aligner_kmer_size}": '',
'--keepDuplicates'
Copy link
Member

@drpatelh drpatelh Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what sort of behaviour this is going to have @adamrtalbot so a bit reluctant to add it in without some proper testing. Did we need to add it to fix something else? Otherwise maybe we create an issue outlining why we need it in and do a proper assessment before adding to the pipeline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discovered it while fixing the tests. When you have two identical sequence transcripts, Salmon will just drop one of them. Downstream, the tools for matching transcripts/genes/names causes an error because it's missing some data. This flag disables this behaviour and makes it match STAR-RSEM, Kallisto etc.

@pinin4fjords has fixed as many downstream problems as he can, but silently dropping transcripts feels like the wrong behaviour in the first place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted SALMON_INDEX from keeping duplicates, raised this issue to discuss further: #1259

When combined with the fix for the module, the tests should pass now (famous last words).

].join(' ').trim() }
publishDir = [
path: { params.save_reference ? "${params.outdir}/genome/index" : params.outdir },
Expand Down
148 changes: 148 additions & 0 deletions subworkflows/local/quantify_pseudo_alignment/tests/main.nf.test
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
nextflow_workflow {

name "Test Workflow QUANTIFY_PSEUDO_ALIGNMENT"
script "../main.nf"
workflow "QUANTIFY_PSEUDO_ALIGNMENT"

tag 'SALMON_QUANT'
tag 'KALLISTO_QUANT'
tag 'CUSTOM_TX2GENE'
tag 'TXIMETA_TXIMPORT'
tag 'SUMMARIZEDEXPERIMENT_SUMMARIZEDEXPERIMENT'

test("salmon") {

setup {
run("SALMON_INDEX") {
script "../../../../modules/nf-core/salmon/index/main.nf"
process {
"""
input[0] = Channel.of([file(params.modules_testdata_base_path + "genomics/homo_sapiens/genome/genome.fasta", checkIfExists: true)])
input[1] = Channel.of([file(params.modules_testdata_base_path + "genomics/homo_sapiens/genome/transcriptome.fasta", checkIfExists: true)])
"""
}
}
}

when {
workflow {
"""
input[0] = [
[ id: 'samplesheet' ],
file(params.pipelines_testdata_base_path + '/csv/samplesheet_micro.csv', checkIfExists: true)
]
input[1] = [
[ id: 'test' ],
[
file(params.modules_testdata_base_path + 'genomics/homo_sapiens/illumina/fastq/test_1.fastq.gz', checkIfExists: true),
file(params.modules_testdata_base_path + 'genomics/homo_sapiens/illumina/fastq/test_2.fastq.gz', checkIfExists: true)
]
]
input[2] = SALMON_INDEX.out.index
input[3] = Channel.of(file(params.modules_testdata_base_path + "genomics/homo_sapiens/genome/transcriptome.fasta", checkIfExists: true))
input[4] = Channel.of(file(params.modules_testdata_base_path + "genomics/homo_sapiens/genome/genome.gtf", checkIfExists: true))
input[5] = 'gene_id'
input[6] = 'gene_name'
input[7] = 'salmon'
input[8] = false
input[9] = 'A'
input[10] = null
input[11] = null
"""
}
}

then {
assertAll(
{ assert workflow.success },
{ assert snapshot(
workflow.out.tpm_gene,
workflow.out.counts_gene,
workflow.out.lengths_gene,
workflow.out.counts_gene_length_scaled,
workflow.out.tpm_transcript,
workflow.out.lengths_transcript,
workflow.out.merged_gene_rds,
workflow.out.merged_gene_rds_length_scaled,
workflow.out.merged_gene_rds_scaled,
workflow.out.merged_counts_transcript,
workflow.out.merged_tpm_transcript,
workflow.out.merged_transcript_rds,
// NOT multiqc, results, versions
pinin4fjords marked this conversation as resolved.
Show resolved Hide resolved
).match()
}
)
}

}

test("kallisto") {

setup {
run("KALLISTO_INDEX") {
script "../../../../modules/nf-core/kallisto/index/main.nf"
process {
"""
input[0] = Channel.of([
[ id:'transcriptome' ], // meta map
file(params.modules_testdata_base_path + "genomics/homo_sapiens/genome/transcriptome.fasta", checkIfExists: true)
])
"""
}
}
}


when {
workflow {
"""
input[0] = [
[ id: 'samplesheet' ],
file(params.pipelines_testdata_base_path + '/csv/samplesheet_micro.csv', checkIfExists: true)
]
input[1] = [
[ id: 'test' ],
[
file(params.modules_testdata_base_path + 'genomics/homo_sapiens/illumina/fastq/test_1.fastq.gz', checkIfExists: true),
file(params.modules_testdata_base_path + 'genomics/homo_sapiens/illumina/fastq/test_2.fastq.gz', checkIfExists: true)
]
]
input[2] = KALLISTO_INDEX.out.index
input[3] = Channel.of(file(params.modules_testdata_base_path + "genomics/homo_sapiens/genome/transcriptome.fasta", checkIfExists: true))
input[4] = Channel.of(file(params.modules_testdata_base_path + "genomics/homo_sapiens/genome/genome.gtf", checkIfExists: true))
input[5] = 'gene_id'
input[6] = 'gene_name'
input[7] = 'kallisto'
input[8] = null
input[9] = null
input[10] = []
input[11] = []
"""
}
}

then {
assertAll(
{ assert workflow.success },
{ assert snapshot(
workflow.out.tpm_gene,
workflow.out.counts_gene,
workflow.out.lengths_gene,
workflow.out.counts_gene_length_scaled,
workflow.out.tpm_transcript,
workflow.out.lengths_transcript,
workflow.out.merged_gene_rds,
workflow.out.merged_gene_rds_length_scaled,
workflow.out.merged_gene_rds_scaled,
workflow.out.merged_counts_transcript,
workflow.out.merged_tpm_transcript,
workflow.out.merged_transcript_rds,
// NOT multiqc, results, versions
).match()
}
)
}

}

}
Loading
Loading