-
Notifications
You must be signed in to change notification settings - Fork 708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nf-test quantify pseudoalignment #1246
nf-test quantify pseudoalignment #1246
Conversation
This PR is against the
|
|
Fixes: - a single sample will now work and not raise an error when trying to parse the count data - duplicate transcripts now work with Salmon (previously it dropped one of them)
@drpatelh this changes the behaviour of Salmon when dealing with duplicate transcripts. Previously, it combined any transcripts with the same sequence. Now, it will keep all transcripts with the same sequence, which will affect mainly alternate haplotypes but I think it's the right™️ option? Want to check if you think this behaviour is OK before we merge. |
subworkflows/local/quantify_pseudo_alignment/tests/main.nf.test
Outdated
Show resolved
Hide resolved
@@ -39,7 +39,8 @@ process { | |||
withName: 'SALMON_INDEX' { | |||
ext.args = { [ | |||
params.gencode ? '--gencode' : '', | |||
params.pseudo_aligner_kmer_size ? "-k ${params.pseudo_aligner_kmer_size}": '' | |||
params.pseudo_aligner_kmer_size ? "-k ${params.pseudo_aligner_kmer_size}": '', | |||
'--keepDuplicates' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what sort of behaviour this is going to have @adamrtalbot so a bit reluctant to add it in without some proper testing. Did we need to add it to fix something else? Otherwise maybe we create an issue outlining why we need it in and do a proper assessment before adding to the pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discovered it while fixing the tests. When you have two identical sequence transcripts, Salmon will just drop one of them. Downstream, the tools for matching transcripts/genes/names causes an error because it's missing some data. This flag disables this behaviour and makes it match STAR-RSEM, Kallisto etc.
@pinin4fjords has fixed as many downstream problems as he can, but silently dropping transcripts feels like the wrong behaviour in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted SALMON_INDEX from keeping duplicates, raised this issue to discuss further: #1259
When combined with the fix for the module, the tests should pass now (famous last words).
Changes: - SALMON_INDEX will keep duplicates - summarizedexperiment will handle the missing transcripts - Version numbers checked in QUANTIFY_PSEUDO_ALIGNMENT subworkflow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you got it now :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Draft PR for an nf-test for QUANTIFY_PSEUDOALIGNEMNT
Problems:
SALMON_QUANT
SE_GENE_LENGTH_SCALED
All fixed!
This changes one global parameter when using Salmon, which means it now keeps duplicate transcripts (i.e. the same sequence, not the same transcript ID).
The rest is pretty straightforward testing.
PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).nextflow run . -profile debug,test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).