Add rename in the MultiQC report for samples without techreps #1341

pinin4fjords · 2024-07-12T17:11:08Z

@MatthiasZepper noted in #1308 that running UMItools extract resulted in inconsistent sample naming in the multiqc report.

This is because if a sample does not have technical replicates, the input fastq files go to the umitools extract process without ever going through a process that applies a suffix, and the relevant MultiQC module effectively uses the input file name to derive a sample name.

One fix we should definitely apply is to have MultiQC use the output (possibly prefixed) file name as the source of identifier: MultiQC/MultiQC#2698.

However it probably also makes sense to be defensive, and tell MultiQC to rename any other related occurrences that come up in future. That's what this PR does, using the sample sheet to derive a set of replacements to pass to multiqc via --replace-names (to be clear this is also an immediate fix for the umi tools issue until if/ when it's fixed in MultiQC).

PR checklist

…ed and unpaired

github-actions · 2024-07-12T17:11:21Z

This PR is against the `master` branch ❌

Do not close this PR
Click Edit and change the base to dev
This CI test will remain failed until you push a new commit

Hi @pinin4fjords,

It looks like this pull-request is has been made against the nf-core/rnaseq master branch.
The master branch on nf-core repositories should always contain code from the latest release.
Because of this, PRs to master are only allowed if they come from the nf-core/rnaseq dev branch.

You do not need to close this PR, you can change the target branch to dev by clicking the "Edit" button at the top of this page.
Note that even after this, the test will continue to show as failing until you push a new commit.

Thanks again for your contribution!

github-actions · 2024-07-12T17:13:38Z

`nf-core lint` overall result: Passed ✅ ⚠️

Posted for pipeline commit f35f51b

+| ✅ 173 tests passed       |+
#| ❔   9 tests were ignored |#
!| ❗   7 tests had warnings |!

❗ Test warnings:

files_exist - File not found: assets/multiqc_config.yml
files_exist - File not found: .github/workflows/awstest.yml
files_exist - File not found: .github/workflows/awsfulltest.yml
pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline

❔ Tests ignored:

files_exist - File is ignored: conf/modules.config
nextflow_config - Config default ignored: params.ribo_database_manifest
files_unchanged - File ignored due to lint config: assets/email_template.html
files_unchanged - File ignored due to lint config: assets/email_template.txt
files_unchanged - File ignored due to lint config: .gitignore or .prettierignore
actions_ci - actions_ci
actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/rnaseq/rnaseq/.github/workflows/awstest.yml
multiqc_config - multiqc_config
modules_config - modules_config

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .editorconfig
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/config.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/ci.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: assets/nf-core-rnaseq_logo_light.png
files_exist - File found: conf/test.config
files_exist - File found: conf/test_full.config
files_exist - File found: docs/images/nf-core-rnaseq_logo_light.png
files_exist - File found: docs/images/nf-core-rnaseq_logo_dark.png
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: main.nf
files_exist - File found: conf/base.config
files_exist - File found: conf/igenomes.config
files_exist - File found: modules.json
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: docs/images/nf-core-rnaseq_logo.png
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/NfcoreTemplate.groovy
files_exist - File not found check: lib/Utils.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: lib/WorkflowMain.groovy
files_exist - File not found check: lib/WorkflowRnaseq.groovy
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: pipeline_template.yml
files_exist - File not found check: Singularity
files_exist - File not found check: lib/nfcore_external_java_deps.jar
files_exist - File not found check: .travis.yml
nextflow_config - Config variable found: manifest.name
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: manifest.homePage
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: params.validationShowHiddenParams
nextflow_config - Config variable found: params.validationSchemaIgnoreParams
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config manifest.name began with nf-core/
nextflow_config - Config variable manifest.homePage began with https://github.com/nf-core/
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - Config manifest.version ends in dev: 3.15.0dev
nextflow_config - Config params.custom_config_version is set to master
nextflow_config - Config params.custom_config_base is set to https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Lines for loading custom profiles found
nextflow_config - nextflow.config contains configuration profile test
nextflow_config - Config default value correct: params.hisat2_build_memory= 200.GB
nextflow_config - Config default value correct: params.gtf_extra_attributes= gene_name
nextflow_config - Config default value correct: params.gtf_group_features= gene_id
nextflow_config - Config default value correct: params.featurecounts_group_type= gene_biotype
nextflow_config - Config default value correct: params.featurecounts_feature_type= exon
nextflow_config - Config default value correct: params.igenomes_base= s3://ngi-igenomes/igenomes/
nextflow_config - Config default value correct: params.trimmer= trimgalore
nextflow_config - Config default value correct: params.min_trimmed_reads= 10000
nextflow_config - Config default value correct: params.umitools_extract_method= string
nextflow_config - Config default value correct: params.umitools_grouping_method= directional
nextflow_config - Config default value correct: params.aligner= star_salmon
nextflow_config - Config default value correct: params.pseudo_aligner_kmer_size= 31
nextflow_config - Config default value correct: params.min_mapped_reads= 5.0
nextflow_config - Config default value correct: params.kallisto_quant_fraglen= 200
nextflow_config - Config default value correct: params.kallisto_quant_fraglen_sd= 200
nextflow_config - Config default value correct: params.stranded_threshold= 0.8
nextflow_config - Config default value correct: params.unstranded_threshold= 0.1
nextflow_config - Config default value correct: params.deseq2_vst= true
nextflow_config - Config default value correct: params.rseqc_modules= bam_stat,inner_distance,infer_experiment,junction_annotation,junction_saturation,read_distribution,read_duplication
nextflow_config - Config default value correct: params.skip_bbsplit= true
nextflow_config - Config default value correct: params.skip_preseq= true
nextflow_config - Config default value correct: params.custom_config_version= master
nextflow_config - Config default value correct: params.custom_config_base= https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Config default value correct: params.max_cpus= 16
nextflow_config - Config default value correct: params.max_memory= 128.GB
nextflow_config - Config default value correct: params.max_time= 240.h
nextflow_config - Config default value correct: params.publish_dir_mode= copy
nextflow_config - Config default value correct: params.max_multiqc_email_size= 25.MB
nextflow_config - Config default value correct: params.validate_params= true
nextflow_config - Config default value correct: params.pipelines_testdata_base_path= https://raw.githubusercontent.com/nf-core/test-datasets/7f1614baeb0ddf66e60be78c3d9fa55440465ac8/
files_unchanged - .gitattributes matches the template
files_unchanged - .prettierrc.yml matches the template
files_unchanged - CODE_OF_CONDUCT.md matches the template
files_unchanged - LICENSE matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/CONTRIBUTING.md matches the template
files_unchanged - .github/ISSUE_TEMPLATE/bug_report.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/config.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/feature_request.yml matches the template
files_unchanged - .github/PULL_REQUEST_TEMPLATE.md matches the template
files_unchanged - .github/workflows/branch.yml matches the template
files_unchanged - .github/workflows/linting_comment.yml matches the template
files_unchanged - .github/workflows/linting.yml matches the template
files_unchanged - assets/sendmail_template.txt matches the template
files_unchanged - assets/nf-core-rnaseq_logo_light.png matches the template
files_unchanged - docs/images/nf-core-rnaseq_logo_light.png matches the template
files_unchanged - docs/images/nf-core-rnaseq_logo_dark.png matches the template
files_unchanged - docs/README.md matches the template
readme - README Nextflow minimum version badge matched config. Badge: 23.04.0, Config: 23.04.0
readme - README Zenodo placeholder was replaced with DOI.
pipeline_name_conventions - Name adheres to nf-core convention
template_strings - Did not find any Jinja template strings (553 files)
schema_lint - Schema lint passed
schema_lint - Schema title + description lint passed
schema_lint - Input mimetype lint passed: 'text/csv'
schema_params - Schema matched params returned from nextflow config
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: clean-up.yml
actions_schema_validation - Workflow validation passed: linting_comment.yml
actions_schema_validation - Workflow validation passed: cloud_tests_small.yml
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: branch.yml
actions_schema_validation - Workflow validation passed: ci.yml
actions_schema_validation - Workflow validation passed: cloud_tests_full.yml
actions_schema_validation - Workflow validation passed: release-announcements.yml
actions_schema_validation - Workflow validation passed: fix-linting.yml
actions_schema_validation - Workflow validation passed: download_pipeline.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'
base_config - conf/base.config found and not ignored.
nfcore_yml - Repository type in .nf-core.yml is valid: pipeline
nfcore_yml - nf-core version in .nf-core.yml is set to the latest version: 2.14.1

Run details

nf-core/tools version 2.14.1
Run at 2024-07-15 15:12:11

MatthiasZepper

Sorry, but I do not feel competent to review this. The changes to the module are evident (and of course also reviewed for the merge to the modules repo), but the details of the ch_name_replacements construction are beyond my comprehension. Thus, I can't think the edge cases through to spot potential issues.

workflows/rnaseq/main.nf

pinin4fjords · 2024-07-15T08:42:09Z

@MatthiasZepper - it probably was a little arcane (I blame Friday afternoon head) - simplified a bit now

MatthiasZepper

Wonderful! Much simpler solution that even I can understand!

What I am somewhat worried about are the possible side effects of doing that in the first place.

You are starting from ch_samples, which is directly taken from the sample sheet, so before .groupTuple(), checkSamplesAfterGrouping(it) and CAT_FASTQ are applied.

This means that samples with more than one pair of FastQs will be mapped to the same meta.id in separate lines of the collected file. The MultiQC doc is very clear about what will happen in that case: Samples mapped to the same name will overwrite the preexisting information when processed.

Since we use the concatenated FastQ files for essentially everything in the pipeline, I think that those lines for the single FastQs will simply never apply (in particular since you set sample_names_replace_exact: true), but it might be safer to never include them in the first place?

~~For that reason, I suggest .cross()ing your ch_name_replacements channel with ch_fastq.single, so that aliases are only written for the samples with one pair of FastQs to avoid duplicates.~~

Update: I changed my mind. Firstly, I realized that ch_fastq.single is of course the wrong channel (since it containes the unpaired samples and not those with just one pair) and I also tried the .cross() myself and just ended once more being f*cking frustrated with Nextflow, because I simply do not know how to act on errors like

groovy.lang.MissingMethodException: No signature of method: Script_4f6e07c481ddf010$_runScript_closure1$_closure2$_closure4$_closure8.doCall() is applicable for argument types: (ArrayList) values: [[SampleA, [[id:SampleA, single_end:false], [id:SampleA, single_e>
Possible solutions: doCall(java.lang.Object, java.lang.Object), findAll(), findAll(), isCase(java.lang.Object), isCase(java.lang.Object)

So: If you like, I suggest doing something that restricts the list to those samples with only one pair of FastQs or a single unpaired FastQ (essentially no duplicate meta.id) before you write the file, but if you do not feel like doing that, I am fine with that as well.

…to rename_samples

pinin4fjords · 2024-07-15T14:42:03Z

@MatthiasZepper

ch_fastq.single is of course the wrong channel (since it containes the unpaired samples and not those with just one pair)

No, you were right first time :-). ch_fastq.single contains all those samples with a single ([read1] or [read1, read2]) tuple after the groupTuple(), so ch_fastq.single is the right way to go!

MatthiasZepper

Brilliant! I probably nudged you to overengineer the whole stuff, but thanks for bearing with me, and sorry for the fuss!

pinin4fjords · 2024-07-15T16:03:15Z

Thanks @MatthiasZepper , all good :-).

pinin4fjords added 4 commits July 12, 2024 14:02

Apply a blanket renaming in multiqc to catch inconsistent naming pair…

cb31b88

…ed and unpaired

Switch renaming to exact matches

6dcf3fd

Bump multiqc module

e12ea63

Separate rename patterns for forward and reverse

47484f6

pinin4fjords changed the base branch from master to dev July 12, 2024 17:12

update CHANGELOG

6fccd13

pinin4fjords requested a review from MatthiasZepper July 12, 2024 17:13

MatthiasZepper reviewed Jul 12, 2024

View reviewed changes

workflows/rnaseq/main.nf Outdated Show resolved Hide resolved

workflows/rnaseq/main.nf Outdated Show resolved Hide resolved

simplify logic to generate renaming table

39071b9

pinin4fjords requested a review from MatthiasZepper July 15, 2024 09:23

Merge branch 'dev' into rename_samples

9135a7a

MatthiasZepper requested changes Jul 15, 2024

View reviewed changes

MatthiasZepper approved these changes Jul 15, 2024

View reviewed changes

pinin4fjords added 2 commits July 15, 2024 14:32

Make name replacement specific to samples without tech reps

a090530

Merge branch 'rename_samples' of https://github.com/nf-core/rnaseq in…

c1e179b

…to rename_samples

pinin4fjords added 2 commits July 15, 2024 16:01

Clarify comment

182f523

Update main.nf

d7ed353

MatthiasZepper approved these changes Jul 15, 2024

View reviewed changes

pinin4fjords changed the title ~~Add wholesale rename in the MultiQC report~~ Add rename in the MultiQC report for samples without techreps Jul 15, 2024

Update CHANGELOG.md

f35f51b

pinin4fjords merged commit e1b2ef7 into dev Jul 15, 2024
27 checks passed

pinin4fjords deleted the rename_samples branch July 15, 2024 16:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add rename in the MultiQC report for samples without techreps #1341

Add rename in the MultiQC report for samples without techreps #1341

pinin4fjords commented Jul 12, 2024 •

edited

Loading

github-actions bot commented Jul 12, 2024

github-actions bot commented Jul 12, 2024 •

edited

Loading

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

MatthiasZepper left a comment

pinin4fjords commented Jul 15, 2024

MatthiasZepper left a comment •

edited

Loading

pinin4fjords commented Jul 15, 2024

MatthiasZepper left a comment

pinin4fjords commented Jul 15, 2024

Add rename in the MultiQC report for samples without techreps #1341

Add rename in the MultiQC report for samples without techreps #1341

Conversation

pinin4fjords commented Jul 12, 2024 • edited Loading

PR checklist

github-actions bot commented Jul 12, 2024

This PR is against the master branch ❌

github-actions bot commented Jul 12, 2024 • edited Loading

nf-core lint overall result: Passed ✅ ⚠️

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

MatthiasZepper left a comment

Choose a reason for hiding this comment

pinin4fjords commented Jul 15, 2024

MatthiasZepper left a comment • edited Loading

Choose a reason for hiding this comment

pinin4fjords commented Jul 15, 2024

MatthiasZepper left a comment

Choose a reason for hiding this comment

pinin4fjords commented Jul 15, 2024

pinin4fjords commented Jul 12, 2024 •

edited

Loading

This PR is against the `master` branch ❌

github-actions bot commented Jul 12, 2024 •

edited

Loading

`nf-core lint` overall result: Passed ✅ ⚠️

MatthiasZepper left a comment •

edited

Loading