nf-test quantify pseudoalignment #1246

adamrtalbot · 2024-03-08T11:49:39Z

Draft PR for an nf-test for QUANTIFY_PSEUDOALIGNEMNT

Problems:

Segmentation fault when running SALMON_QUANT
Incompatible files when running SE_GENE_LENGTH_SCALED

All fixed!

This changes one global parameter when using Salmon, which means it now keeps duplicate transcripts (i.e. the same sequence, not the same transcript ID).

The rest is pretty straightforward testing.

PR checklist

github-actions · 2024-03-08T11:49:52Z

This PR is against the `master` branch ❌

Do not close this PR
Click Edit and change the base to dev
This CI test will remain failed until you push a new commit

Hi @adamrtalbot,

It looks like this pull-request is has been made against the adamrtalbot/rnaseq master branch.
The master branch on nf-core repositories should always contain code from the latest release.
Because of this, PRs to master are only allowed if they come from the adamrtalbot/rnaseq dev branch.

You do not need to close this PR, you can change the target branch to dev by clicking the "Edit" button at the top of this page.
Note that even after this, the test will continue to show as failing until you push a new commit.

Thanks again for your contribution!

github-actions · 2024-03-08T11:51:12Z

`nf-core lint` overall result: Passed ✅ ⚠️

Posted for pipeline commit 35ff7ad

+| ✅ 169 tests passed       |+
#| ❔   8 tests were ignored |#
!| ❗   7 tests had warnings |!

❗ Test warnings:

files_exist - File not found: assets/multiqc_config.yml
files_exist - File not found: .github/workflows/awstest.yml
files_exist - File not found: .github/workflows/awsfulltest.yml
pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline

❔ Tests ignored:

files_exist - File is ignored: conf/modules.config
nextflow_config - Config default ignored: params.ribo_database_manifest
files_unchanged - File ignored due to lint config: .github/PULL_REQUEST_TEMPLATE.md
files_unchanged - File ignored due to lint config: assets/email_template.html
files_unchanged - File ignored due to lint config: assets/email_template.txt
files_unchanged - File ignored due to lint config: .gitignore or .prettierignore or pyproject.toml
actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/rnaseq/rnaseq/.github/workflows/awstest.yml
multiqc_config - 'assets/multiqc_config.yml' not found

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .editorconfig
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/config.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/ci.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: assets/nf-core-rnaseq_logo_light.png
files_exist - File found: conf/test.config
files_exist - File found: conf/test_full.config
files_exist - File found: docs/images/nf-core-rnaseq_logo_light.png
files_exist - File found: docs/images/nf-core-rnaseq_logo_dark.png
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: main.nf
files_exist - File found: conf/base.config
files_exist - File found: conf/igenomes.config
files_exist - File found: modules.json
files_exist - File found: pyproject.toml
files_exist - File not found check: Singularity
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: pipeline_template.yml
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: docs/images/nf-core-rnaseq_logo.png
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: lib/Utils.groovy
files_exist - File not found check: lib/WorkflowMain.groovy
files_exist - File not found check: lib/NfcoreTemplate.groovy
files_exist - File not found check: lib/WorkflowRnaseq.groovy
files_exist - File not found check: lib/nfcore_external_java_deps.jar
files_exist - File not found check: .travis.yml
nextflow_config - Config variable found: manifest.name
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: manifest.homePage
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: params.validationShowHiddenParams
nextflow_config - Config variable found: params.validationSchemaIgnoreParams
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config manifest.name began with nf-core/
nextflow_config - Config variable manifest.homePage began with https://github.com/nf-core/
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - Config manifest.version ends in dev: 3.15.0dev
nextflow_config - Config params.custom_config_version is set to master
nextflow_config - Config params.custom_config_base is set to https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Lines for loading custom profiles found
nextflow_config - nextflow.config contains configuration profile test
nextflow_config - Config default value correct: params.hisat2_build_memory= 200.GB
nextflow_config - Config default value correct: params.gtf_extra_attributes= gene_name
nextflow_config - Config default value correct: params.gtf_group_features= gene_id
nextflow_config - Config default value correct: params.featurecounts_group_type= gene_biotype
nextflow_config - Config default value correct: params.featurecounts_feature_type= exon
nextflow_config - Config default value correct: params.igenomes_base= s3://ngi-igenomes/igenomes
nextflow_config - Config default value correct: params.trimmer= trimgalore
nextflow_config - Config default value correct: params.min_trimmed_reads= 10000
nextflow_config - Config default value correct: params.umitools_extract_method= string
nextflow_config - Config default value correct: params.umitools_grouping_method= directional
nextflow_config - Config default value correct: params.aligner= star_salmon
nextflow_config - Config default value correct: params.pseudo_aligner_kmer_size= 31
nextflow_config - Config default value correct: params.min_mapped_reads= 5.0
nextflow_config - Config default value correct: params.kallisto_quant_fraglen= 200
nextflow_config - Config default value correct: params.kallisto_quant_fraglen_sd= 200
nextflow_config - Config default value correct: params.deseq2_vst= true
nextflow_config - Config default value correct: params.rseqc_modules= bam_stat,inner_distance,infer_experiment,junction_annotation,junction_saturation,read_distribution,read_duplication
nextflow_config - Config default value correct: params.skip_bbsplit= true
nextflow_config - Config default value correct: params.skip_preseq= true
nextflow_config - Config default value correct: params.custom_config_version= master
nextflow_config - Config default value correct: params.custom_config_base= https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Config default value correct: params.max_cpus= 16
nextflow_config - Config default value correct: params.max_memory= 128.GB
nextflow_config - Config default value correct: params.max_time= 240.h
nextflow_config - Config default value correct: params.publish_dir_mode= copy
nextflow_config - Config default value correct: params.max_multiqc_email_size= 25.MB
nextflow_config - Config default value correct: params.validate_params= true
files_unchanged - .gitattributes matches the template
files_unchanged - .prettierrc.yml matches the template
files_unchanged - CODE_OF_CONDUCT.md matches the template
files_unchanged - LICENSE matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/CONTRIBUTING.md matches the template
files_unchanged - .github/ISSUE_TEMPLATE/bug_report.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/config.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/feature_request.yml matches the template
files_unchanged - .github/workflows/branch.yml matches the template
files_unchanged - .github/workflows/linting_comment.yml matches the template
files_unchanged - .github/workflows/linting.yml matches the template
files_unchanged - assets/sendmail_template.txt matches the template
files_unchanged - assets/nf-core-rnaseq_logo_light.png matches the template
files_unchanged - docs/images/nf-core-rnaseq_logo_light.png matches the template
files_unchanged - docs/images/nf-core-rnaseq_logo_dark.png matches the template
files_unchanged - docs/README.md matches the template
actions_ci - '.github/workflows/ci.yml' is triggered on expected events
actions_ci - '.github/workflows/ci.yml' checks minimum NF version
readme - README Nextflow minimum version badge matched config. Badge: 23.04.0, Config: 23.04.0
readme - README Zenodo placeholder was replaced with DOI.
pipeline_name_conventions - Name adheres to nf-core convention
template_strings - Did not find any Jinja template strings (530 files)
schema_lint - Schema lint passed
schema_lint - Schema title + description lint passed
schema_lint - Input mimetype lint passed: 'text/csv'
schema_params - Schema matched params returned from nextflow config
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: cloud_tests_small.yml
actions_schema_validation - Workflow validation passed: fix-linting.yml
actions_schema_validation - Workflow validation passed: linting_comment.yml
actions_schema_validation - Workflow validation passed: clean-up.yml
actions_schema_validation - Workflow validation passed: branch.yml
actions_schema_validation - Workflow validation passed: ci.yml
actions_schema_validation - Workflow validation passed: release-announcements.yml
actions_schema_validation - Workflow validation passed: cloud_tests_full.yml
actions_schema_validation - Workflow validation passed: download_pipeline.yml
actions_schema_validation - Workflow validation passed: linting.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'

Run details

nf-core/tools version 2.13.1
Run at 2024-03-12 11:40:00

Fixes: - a single sample will now work and not raise an error when trying to parse the count data - duplicate transcripts now work with Salmon (previously it dropped one of them)

adamrtalbot · 2024-03-08T18:06:10Z

@drpatelh this changes the behaviour of Salmon when dealing with duplicate transcripts. Previously, it combined any transcripts with the same sequence. Now, it will keep all transcripts with the same sequence, which will affect mainly alternate haplotypes but I think it's the right™️ option? Want to check if you think this behaviour is OK before we merge.

subworkflows/local/quantify_pseudo_alignment/tests/main.nf.test

modules/nf-core/summarizedexperiment/summarizedexperiment/main.nf

drpatelh · 2024-03-12T09:26:12Z

subworkflows/local/prepare_genome/nextflow.config

@@ -39,7 +39,8 @@ process {
    withName: 'SALMON_INDEX' {
        ext.args   = { [
                params.gencode ? '--gencode' : '',
-                params.pseudo_aligner_kmer_size ? "-k ${params.pseudo_aligner_kmer_size}": ''
+                params.pseudo_aligner_kmer_size ? "-k ${params.pseudo_aligner_kmer_size}": '',
+                '--keepDuplicates'


Not sure what sort of behaviour this is going to have @adamrtalbot so a bit reluctant to add it in without some proper testing. Did we need to add it to fix something else? Otherwise maybe we create an issue outlining why we need it in and do a proper assessment before adding to the pipeline.

We discovered it while fixing the tests. When you have two identical sequence transcripts, Salmon will just drop one of them. Downstream, the tools for matching transcripts/genes/names causes an error because it's missing some data. This flag disables this behaviour and makes it match STAR-RSEM, Kallisto etc.

@pinin4fjords has fixed as many downstream problems as he can, but silently dropping transcripts feels like the wrong behaviour in the first place.

Reverted SALMON_INDEX from keeping duplicates, raised this issue to discuss further: #1259

When combined with the fix for the module, the tests should pass now (famous last words).

Changes: - SALMON_INDEX will keep duplicates - summarizedexperiment will handle the missing transcripts - Version numbers checked in QUANTIFY_PSEUDO_ALIGNMENT subworkflow

pinin4fjords

Looks like you got it now :-)

maxulysse

LGTM

adamrtalbot and others added 2 commits March 8, 2024 10:21

Add nf-test for quantify_psueudoalignment

c71189f

Some improvements but not working :(

8c68e1d

adamrtalbot changed the base branch from master to dev March 8, 2024 11:50

adamrtalbot and others added 6 commits March 8, 2024 11:51

uncomment the comment

bc57910

Swap to nf-core rnaseq test data (full path currently)

40c006d

Reference genome path mistake

9f27403

Fix bugs in quantify_pseudoalignment

4a5c107

Fixes: - a single sample will now work and not raise an error when trying to parse the count data - duplicate transcripts now work with Salmon (previously it dropped one of them)

fixup

b179291

Update to use samplesheet in test-datasets repo

1b2b8a4

adamrtalbot requested review from maxulysse, drpatelh and pinin4fjords and removed request for maxulysse March 8, 2024 18:02

adamrtalbot added 2 commits March 8, 2024 18:06

Merge branch 'dev' into nf-test_quantify_pseudoalignment

80acef9

Omit variable files from snap

2fc7e4d

adamrtalbot mentioned this pull request Mar 11, 2024

Add nf-test for all components in pipeline #1223

Closed

19 tasks

pinin4fjords reviewed Mar 11, 2024

View reviewed changes

subworkflows/local/quantify_pseudo_alignment/tests/main.nf.test Outdated Show resolved Hide resolved

adamrtalbot linked an issue Mar 11, 2024 that may be closed by this pull request

Add nf-test to QUANTIFY_PSEUDO_ALIGNMENT subworkflow #1179

Closed

adamrtalbot mentioned this pull request Mar 11, 2024

Add nf-test to QUANTIFY_PSEUDO_ALIGNMENT subworkflow #1179

Closed

drpatelh reviewed Mar 12, 2024

View reviewed changes

modules/nf-core/summarizedexperiment/summarizedexperiment/main.nf Show resolved Hide resolved

drpatelh reviewed Mar 12, 2024

View reviewed changes

adamrtalbot added 3 commits March 12, 2024 10:33

SALMON tests keepDuplicates

da29edb

Changes: - SALMON_INDEX will keep duplicates - summarizedexperiment will handle the missing transcripts - Version numbers checked in QUANTIFY_PSEUDO_ALIGNMENT subworkflow

Merge branch 'dev' into nf-test_quantify_pseudoalignment

63f71d7

Merge branch 'dev' into nf-test_quantify_pseudoalignment

35ff7ad

pinin4fjords approved these changes Mar 12, 2024

View reviewed changes

adamrtalbot enabled auto-merge March 12, 2024 11:41

maxulysse approved these changes Mar 12, 2024

View reviewed changes

adamrtalbot merged commit e93915d into nf-core:dev Mar 12, 2024
30 checks passed

drpatelh mentioned this pull request Mar 12, 2024

Salmon --keepDuplicates by default #1259

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nf-test quantify pseudoalignment #1246

nf-test quantify pseudoalignment #1246

adamrtalbot commented Mar 8, 2024 •

edited

Loading

github-actions bot commented Mar 8, 2024

github-actions bot commented Mar 8, 2024 •

edited

Loading

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

adamrtalbot commented Mar 8, 2024

drpatelh Mar 12, 2024 •

edited

Loading

adamrtalbot Mar 12, 2024

adamrtalbot Mar 12, 2024

pinin4fjords left a comment

maxulysse left a comment

nf-test quantify pseudoalignment #1246

nf-test quantify pseudoalignment #1246

Conversation

adamrtalbot commented Mar 8, 2024 • edited Loading

PR checklist

github-actions bot commented Mar 8, 2024

This PR is against the master branch ❌

github-actions bot commented Mar 8, 2024 • edited Loading

nf-core lint overall result: Passed ✅ ⚠️

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

adamrtalbot commented Mar 8, 2024

drpatelh Mar 12, 2024 • edited Loading

Choose a reason for hiding this comment

adamrtalbot Mar 12, 2024

Choose a reason for hiding this comment

adamrtalbot Mar 12, 2024

Choose a reason for hiding this comment

pinin4fjords left a comment

Choose a reason for hiding this comment

maxulysse left a comment

Choose a reason for hiding this comment

adamrtalbot commented Mar 8, 2024 •

edited

Loading

This PR is against the `master` branch ❌

github-actions bot commented Mar 8, 2024 •

edited

Loading

`nf-core lint` overall result: Passed ✅ ⚠️

drpatelh Mar 12, 2024 •

edited

Loading