Factor out preprocessing #1342

pinin4fjords · 2024-07-16T11:08:03Z

Finally move preprocessing logic to the subworkflow, which I factored out of RNAseq to use in Riboseq.

I've had to update the subworkflow itself to incorporate some improvements related to strandedness, which we'd made in the meantime nf-core/modules#5982, so that will need merging first.

Edit: also nf-core/modules#5988

PR checklist

github-actions · 2024-07-16T11:08:15Z

This PR is against the `master` branch ❌

Do not close this PR
Click Edit and change the base to dev
This CI test will remain failed until you push a new commit

Hi @pinin4fjords,

It looks like this pull-request is has been made against the nf-core/rnaseq master branch.
The master branch on nf-core repositories should always contain code from the latest release.
Because of this, PRs to master are only allowed if they come from the nf-core/rnaseq dev branch.

You do not need to close this PR, you can change the target branch to dev by clicking the "Edit" button at the top of this page.
Note that even after this, the test will continue to show as failing until you push a new commit.

Thanks again for your contribution!

github-actions · 2024-07-16T11:10:07Z

`nf-core lint` overall result: Passed ✅ ⚠️

Posted for pipeline commit 5507a6d

+| ✅ 173 tests passed       |+
#| ❔   9 tests were ignored |#
!| ❗   7 tests had warnings |!

❗ Test warnings:

files_exist - File not found: assets/multiqc_config.yml
files_exist - File not found: .github/workflows/awstest.yml
files_exist - File not found: .github/workflows/awsfulltest.yml
pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline

❔ Tests ignored:

files_exist - File is ignored: conf/modules.config
nextflow_config - Config default ignored: params.ribo_database_manifest
files_unchanged - File ignored due to lint config: assets/email_template.html
files_unchanged - File ignored due to lint config: assets/email_template.txt
files_unchanged - File ignored due to lint config: .gitignore or .prettierignore
actions_ci - actions_ci
actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/rnaseq/rnaseq/.github/workflows/awstest.yml
multiqc_config - multiqc_config
modules_config - modules_config

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .editorconfig
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/config.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/ci.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: assets/nf-core-rnaseq_logo_light.png
files_exist - File found: conf/test.config
files_exist - File found: conf/test_full.config
files_exist - File found: docs/images/nf-core-rnaseq_logo_light.png
files_exist - File found: docs/images/nf-core-rnaseq_logo_dark.png
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: main.nf
files_exist - File found: conf/base.config
files_exist - File found: conf/igenomes.config
files_exist - File found: modules.json
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: docs/images/nf-core-rnaseq_logo.png
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/NfcoreTemplate.groovy
files_exist - File not found check: lib/Utils.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: lib/WorkflowMain.groovy
files_exist - File not found check: lib/WorkflowRnaseq.groovy
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: pipeline_template.yml
files_exist - File not found check: Singularity
files_exist - File not found check: lib/nfcore_external_java_deps.jar
files_exist - File not found check: .travis.yml
nextflow_config - Config variable found: manifest.name
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: manifest.homePage
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: params.validationShowHiddenParams
nextflow_config - Config variable found: params.validationSchemaIgnoreParams
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config manifest.name began with nf-core/
nextflow_config - Config variable manifest.homePage began with https://github.com/nf-core/
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - Config manifest.version ends in dev: 3.15.0dev
nextflow_config - Config params.custom_config_version is set to master
nextflow_config - Config params.custom_config_base is set to https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Lines for loading custom profiles found
nextflow_config - nextflow.config contains configuration profile test
nextflow_config - Config default value correct: params.hisat2_build_memory= 200.GB
nextflow_config - Config default value correct: params.gtf_extra_attributes= gene_name
nextflow_config - Config default value correct: params.gtf_group_features= gene_id
nextflow_config - Config default value correct: params.featurecounts_group_type= gene_biotype
nextflow_config - Config default value correct: params.featurecounts_feature_type= exon
nextflow_config - Config default value correct: params.igenomes_base= s3://ngi-igenomes/igenomes/
nextflow_config - Config default value correct: params.trimmer= trimgalore
nextflow_config - Config default value correct: params.min_trimmed_reads= 10000
nextflow_config - Config default value correct: params.umitools_extract_method= string
nextflow_config - Config default value correct: params.umitools_grouping_method= directional
nextflow_config - Config default value correct: params.aligner= star_salmon
nextflow_config - Config default value correct: params.pseudo_aligner_kmer_size= 31
nextflow_config - Config default value correct: params.min_mapped_reads= 5.0
nextflow_config - Config default value correct: params.kallisto_quant_fraglen= 200
nextflow_config - Config default value correct: params.kallisto_quant_fraglen_sd= 200
nextflow_config - Config default value correct: params.stranded_threshold= 0.8
nextflow_config - Config default value correct: params.unstranded_threshold= 0.1
nextflow_config - Config default value correct: params.deseq2_vst= true
nextflow_config - Config default value correct: params.rseqc_modules= bam_stat,inner_distance,infer_experiment,junction_annotation,junction_saturation,read_distribution,read_duplication
nextflow_config - Config default value correct: params.skip_bbsplit= true
nextflow_config - Config default value correct: params.skip_preseq= true
nextflow_config - Config default value correct: params.custom_config_version= master
nextflow_config - Config default value correct: params.custom_config_base= https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Config default value correct: params.max_cpus= 16
nextflow_config - Config default value correct: params.max_memory= 128.GB
nextflow_config - Config default value correct: params.max_time= 240.h
nextflow_config - Config default value correct: params.publish_dir_mode= copy
nextflow_config - Config default value correct: params.max_multiqc_email_size= 25.MB
nextflow_config - Config default value correct: params.validate_params= true
nextflow_config - Config default value correct: params.pipelines_testdata_base_path= https://raw.githubusercontent.com/nf-core/test-datasets/7f1614baeb0ddf66e60be78c3d9fa55440465ac8/
files_unchanged - .gitattributes matches the template
files_unchanged - .prettierrc.yml matches the template
files_unchanged - CODE_OF_CONDUCT.md matches the template
files_unchanged - LICENSE matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/CONTRIBUTING.md matches the template
files_unchanged - .github/ISSUE_TEMPLATE/bug_report.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/config.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/feature_request.yml matches the template
files_unchanged - .github/PULL_REQUEST_TEMPLATE.md matches the template
files_unchanged - .github/workflows/branch.yml matches the template
files_unchanged - .github/workflows/linting_comment.yml matches the template
files_unchanged - .github/workflows/linting.yml matches the template
files_unchanged - assets/sendmail_template.txt matches the template
files_unchanged - assets/nf-core-rnaseq_logo_light.png matches the template
files_unchanged - docs/images/nf-core-rnaseq_logo_light.png matches the template
files_unchanged - docs/images/nf-core-rnaseq_logo_dark.png matches the template
files_unchanged - docs/README.md matches the template
readme - README Nextflow minimum version badge matched config. Badge: 23.04.0, Config: 23.04.0
readme - README Zenodo placeholder was replaced with DOI.
pipeline_name_conventions - Name adheres to nf-core convention
template_strings - Did not find any Jinja template strings (561 files)
schema_lint - Schema lint passed
schema_lint - Schema title + description lint passed
schema_lint - Input mimetype lint passed: 'text/csv'
schema_params - Schema matched params returned from nextflow config
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: clean-up.yml
actions_schema_validation - Workflow validation passed: linting_comment.yml
actions_schema_validation - Workflow validation passed: cloud_tests_small.yml
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: branch.yml
actions_schema_validation - Workflow validation passed: ci.yml
actions_schema_validation - Workflow validation passed: cloud_tests_full.yml
actions_schema_validation - Workflow validation passed: release-announcements.yml
actions_schema_validation - Workflow validation passed: fix-linting.yml
actions_schema_validation - Workflow validation passed: download_pipeline.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'
base_config - conf/base.config found and not ignored.
nfcore_yml - Repository type in .nf-core.yml is valid: pipeline
nfcore_yml - nf-core version in .nf-core.yml is set to the latest version: 2.14.1

Run details

nf-core/tools version 2.14.1
Run at 2024-07-17 13:37:11

maxulysse · 2024-07-17T11:21:21Z

workflows/rnaseq/main.nf

+        ch_name_replacements = ch_fastq
+            .filter{ meta, reads ->
+                reads.size() == 1
+            }
            .map{ meta, reads ->
-                def name1 = file(reads[0]).simpleName + "\t" + meta.id + '_1'
+                def name1 = file(reads[0][0]).simpleName + "\t" + meta.id + '_1'
                if (reads[1] ){
-                    def name2 = file(reads[1]).simpleName + "\t" + meta.id + '_2'
+                    def name2 = file(reads[0][1]).simpleName + "\t" + meta.id + '_2'
                    return [ name1, name2 ]


What's this whole change about?

It's because I'm using ch_fastq directly (since the one with single and multiple branches is moved to the subworkflow). So these are now tuples output by groupTuple requiring the double index:

[[foo.fastq.gz,bar.fastq.gz]]

maxulysse

I love this <3

pinin4fjords · 2024-07-17T12:51:22Z

I love this <3

Glad to hear- especially since you were skeptical about the subworkflow!

…/rnaseq into factor_out_preprocessing

Factor out preprocessing logic to nf-core subworkflow

7315386

pinin4fjords marked this pull request as draft July 16, 2024 11:08

pinin4fjords changed the base branch from master to dev July 16, 2024 11:08

update changelog

2389882

pinin4fjords mentioned this pull request Jul 16, 2024

Improve strandedness derivation in rnaseq preprocessing swf nf-core/modules#5982

Merged

17 tasks

pinin4fjords and others added 8 commits July 17, 2024 09:16

Update RNAseq preprocessing swf

4d30844

update modules.json

9e8f1e0

fix up ribo db wiring

6a22103

Install swf from branch for now

2447c6b

Fix tests

3da0bc3

Update modules.json

259dccf

Move strandedness function testing to swf

fe950ba

Install from modules master

6bb1a79

pinin4fjords marked this pull request as ready for review July 17, 2024 11:12

pinin4fjords requested a review from maxulysse July 17, 2024 11:13

maxulysse reviewed Jul 17, 2024

View reviewed changes

maxulysse approved these changes Jul 17, 2024

View reviewed changes

pinin4fjords added 2 commits July 17, 2024 13:16

Strip preprocessing components relocated to subworkflows

1049fa5

Merge branch 'factor_out_preprocessing' of https://github.com/nf-core…

5507a6d

…/rnaseq into factor_out_preprocessing

pinin4fjords merged commit 5bd04b4 into dev Jul 17, 2024
37 checks passed

pinin4fjords deleted the factor_out_preprocessing branch July 17, 2024 15:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Factor out preprocessing #1342

Factor out preprocessing #1342

pinin4fjords commented Jul 16, 2024 •

edited

Loading

github-actions bot commented Jul 16, 2024

github-actions bot commented Jul 16, 2024 •

edited

Loading

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

maxulysse Jul 17, 2024

pinin4fjords Jul 17, 2024

maxulysse left a comment

pinin4fjords commented Jul 17, 2024

Factor out preprocessing #1342

Factor out preprocessing #1342

Conversation

pinin4fjords commented Jul 16, 2024 • edited Loading

PR checklist

github-actions bot commented Jul 16, 2024

This PR is against the master branch ❌

github-actions bot commented Jul 16, 2024 • edited Loading

nf-core lint overall result: Passed ✅ ⚠️

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

maxulysse Jul 17, 2024

Choose a reason for hiding this comment

pinin4fjords Jul 17, 2024

Choose a reason for hiding this comment

maxulysse left a comment

Choose a reason for hiding this comment

pinin4fjords commented Jul 17, 2024

pinin4fjords commented Jul 16, 2024 •

edited

Loading

This PR is against the `master` branch ❌

github-actions bot commented Jul 16, 2024 •

edited

Loading

`nf-core lint` overall result: Passed ✅ ⚠️