-
Notifications
You must be signed in to change notification settings - Fork 708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rename in the MultiQC report for samples without techreps #1341
Conversation
This PR is against the
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, but I do not feel competent to review this. The changes to the module are evident (and of course also reviewed for the merge to the modules repo), but the details of the ch_name_replacements
construction are beyond my comprehension. Thus, I can't think the edge cases through to spot potential issues.
@MatthiasZepper - it probably was a little arcane (I blame Friday afternoon head) - simplified a bit now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonderful! Much simpler solution that even I can understand!
What I am somewhat worried about are the possible side effects of doing that in the first place.
You are starting from ch_samples
, which is directly taken from the sample sheet, so before .groupTuple()
, checkSamplesAfterGrouping(it)
and CAT_FASTQ
are applied.
This means that samples with more than one pair of FastQs will be mapped to the same meta.id
in separate lines of the collected file. The MultiQC doc is very clear about what will happen in that case: Samples mapped to the same name will overwrite the preexisting information when processed.
Since we use the concatenated FastQ files for essentially everything in the pipeline, I think that those lines for the single FastQs will simply never apply (in particular since you set sample_names_replace_exact: true
), but it might be safer to never include them in the first place?
For that reason, I suggest .cross()
ing your ch_name_replacements
channel with ch_fastq.single
, so that aliases are only written for the samples with one pair of FastQs to avoid duplicates.
Update: I changed my mind. Firstly, I realized that ch_fastq.single
is of course the wrong channel (since it containes the unpaired samples and not those with just one pair) and I also tried the .cross()
myself and just ended once more being f*cking frustrated with Nextflow, because I simply do not know how to act on errors like
groovy.lang.MissingMethodException: No signature of method: Script_4f6e07c481ddf010$_runScript_closure1$_closure2$_closure4$_closure8.doCall() is applicable for argument types: (ArrayList) values: [[SampleA, [[id:SampleA, single_end:false], [id:SampleA, single_e>
Possible solutions: doCall(java.lang.Object, java.lang.Object), findAll(), findAll(), isCase(java.lang.Object), isCase(java.lang.Object)
So: If you like, I suggest doing something that restricts the list to those samples with only one pair of FastQs or a single unpaired FastQ (essentially no duplicate meta.id
) before you write the file, but if you do not feel like doing that, I am fine with that as well.
No, you were right first time :-). ch_fastq.single contains all those samples with a single ([read1] or [read1, read2]) tuple after the groupTuple(), so |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brilliant! I probably nudged you to overengineer the whole stuff, but thanks for bearing with me, and sorry for the fuss!
Thanks @MatthiasZepper , all good :-). |
@MatthiasZepper noted in #1308 that running UMItools extract resulted in inconsistent sample naming in the multiqc report.
This is because if a sample does not have technical replicates, the input fastq files go to the umitools extract process without ever going through a process that applies a suffix, and the relevant MultiQC module effectively uses the input file name to derive a sample name.
One fix we should definitely apply is to have MultiQC use the output (possibly prefixed) file name as the source of identifier: MultiQC/MultiQC#2698.
However it probably also makes sense to be defensive, and tell MultiQC to rename any other related occurrences that come up in future. That's what this PR does, using the sample sheet to derive a set of replacements to pass to multiqc via
--replace-names
(to be clear this is also an immediate fix for the umi tools issue until if/ when it's fixed in MultiQC).PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).nextflow run . -profile debug,test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).