-
Notifications
You must be signed in to change notification settings - Fork 441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
correctly embed reference in bam/cram #6594
base: main
Are you sure you want to change the base?
Conversation
currently minimap2's bam/cram output stores a tmp file as the reference fasta. This prevents other galaxy tools from using it as the spec expects that the stored reference is a real file. This patch simply copies existing code over to ensure the real full path to the reference is used. It won't solve all cases (e.g. when reference file is deleted) but it will solve the other cases.
example error that this patch fixes tool: CONVERTER_cram_to_bam_0
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you bump the version ? As you said, this is just as broken in anything but the most basic scenario where inputs are assumed to be available on a shared filesystem. As soon as there's any sort of staging involved this will break. Wasn't there a cram mode that didn't use reference-guided compression ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change still does not convince me. If I got the discussion in #6523 right then you need to supply the reference in downstream tools anyway if the linked one is not available.
I think this (i.e. the reference not being available) will happen in many/most cases (would also be my interpretation of the comment of @mvdbeek). Thus IMO the more important part of the solution should be that we always allow to supply the reference in all downstream tools that process CRAM.
Wondering how much space is saved in this mode vs the no-ref mode.
To me all this makes CRAM rather user unfriendly and unusable in a HPC/cloud environment like Galaxy. But I might be completely wrong - as so often :)
@@ -142,7 +137,11 @@ | |||
-K $io_options.K | |||
#end if | |||
-t \${GALAXY_SLOTS:-4} | |||
reference.fa | |||
#if $reference_source.reference_source_selector == 'history': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code block should be a macro token.
currently minimap2's bam/cram output stores a tmp file as the reference fasta. This prevents other galaxy tools from using it as the spec expects that the stored reference is a real file. This patch simply copies existing code over to ensure the real full path to the reference is used. It won't solve all cases (e.g. when reference file is deleted) but it will solve the other cases.
FOR CONTRIBUTOR: