Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

correctly embed reference in bam/cram #6594

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

alpapan
Copy link

@alpapan alpapan commented Nov 28, 2024

currently minimap2's bam/cram output stores a tmp file as the reference fasta. This prevents other galaxy tools from using it as the spec expects that the stored reference is a real file. This patch simply copies existing code over to ensure the real full path to the reference is used. It won't solve all cases (e.g. when reference file is deleted) but it will solve the other cases.

FOR CONTRIBUTOR:

  • I have read the CONTRIBUTING.md document and this tool is appropriate for the tools-iuc repo.
  • License permits unrestricted use (educational + commercial)
  • This PR adds a new tool or tool collection
  • This PR updates an existing tool or tool collection
  • This PR does something else (explain below)

currently minimap2's bam/cram output stores a tmp file as the reference fasta. This prevents other galaxy tools from using it as the spec expects that the stored reference is a real file. This patch simply copies existing code over to ensure the real full path to the reference is used. It won't solve all cases (e.g. when reference file is deleted) but it will solve the other cases.
@alpapan
Copy link
Author

alpapan commented Nov 28, 2024

example error that this patch fixes

tool: CONVERTER_cram_to_bam_0

SAMtools 1.17
[E::fai_build3_core] Failed to open the file /galaxy/database/jobs_directory/022/22894/working/reference.fa
[E::refs_load_fai] Failed to open reference file '/galaxy/database/jobs_directory/022/22894/working/reference.fa'
[W::cram_get_ref] Failed to populate reference for id 0
[E::cram_decode_slice] Unable to fetch reference #0:5264-59085

[E::fai_build3_core] Failed to open the file /galaxy/database/jobs_directory/022/22894/working/reference.fa
[E::refs_load_fai] Failed to open reference file '/galaxy/database/jobs_directory/022/22894/working/reference.fa'
[W::cram_get_ref] Failed to populate reference for id 0
[E::cram_decode_slice] Unable to fetch reference #0:1-23670

[E::cram_next_slice] Slice decode failure
samtools sort: truncated file. Aborting
[E::fai_build3_core] Failed to open the file /galaxy/database/jobs_directory/022/22894/working/reference.fa
[E::refs_load_fai] Failed to open reference file '/galaxy/database/jobs_directory/022/22894/working/reference.fa'
[W::cram_get_ref] Failed to populate reference for id 0
[E::fai_build3_core] Failed to open the file /galaxy/database/jobs_directory/022/22894/working/reference.fa
[E::refs_load_fai] Failed to open reference file '/galaxy/database/jobs_directory/022/22894/working/reference.fa'
[W::cram_get_ref] Failed to populate reference for id 0
[E::fai_build3_core] Failed to open the file /galaxy/database/jobs_directory/022/22894/working/reference.fa
[E::refs_load_fai] Failed to open reference file '/galaxy/database/jobs_directory/022/22894/working/reference.fa'
[W::cram_get_ref] Failed to populate reference for id 0
[E::cram_decode_slice] Unable to fetch reference #0:192066-249955

[E::fai_build3_core] Failed to open the file /galaxy/database/jobs_directory/022/22894/working/reference.fa
[E::refs_load_fai] Failed to open reference file '/galaxy/database/jobs_directory/022/22894/working/reference.fa'
[W::cram_get_ref] Failed to populate reference for id 0
[E::cram_decode_slice] Unable to fetch reference #0:122987-171298

[E::fai_build3_core] Failed to open the file /galaxy/database/jobs_directory/022/22894/working/reference.fa
[E::refs_load_fai] Failed to open reference file '/galaxy/database/jobs_directory/022/22894/working/reference.fa'
[W::cram_get_ref] Failed to populate reference for id 0
[E::fai_build3_core] Failed to open the file /galaxy/database/jobs_directory/022/22894/working/reference.fa
[E::refs_load_fai] Failed to open reference file '/galaxy/database/jobs_directory/022/22894/working/reference.fa'
[W::cram_get_ref] Failed to populate reference for id 0
[E::fai_build3_core] Failed to open the file /galaxy/database/jobs_directory/022/22894/working/reference.fa
[E::refs_load_fai] Failed to open reference file '/galaxy/database/jobs_directory/022/22894/working/reference.fa'
[W::cram_get_ref] Failed to populate reference for id 0
[E::cram_decode_slice] Unable to fetch reference #0:332137-380341

[E::fai_build3_core] Failed to open the file /galaxy/database/jobs_directory/022/22894/working/reference.fa
[E::refs_load_fai] Failed to open reference file '/galaxy/database/jobs_directory/022/22894/working/reference.fa'
[W::cram_get_ref] Failed to populate reference for id 0
[E::cram_decode_slice] Unable to fetch reference #0:257327-310839

[E::fai_build3_core] Failed to open the file /galaxy/database/jobs_directory/022/22894/working/reference.fa
[E::refs_load_fai] Failed to open reference file '/galaxy/database/jobs_directory/022/22894/working/reference.fa'
[W::cram_get_ref] Failed to populate reference for id 0
[E::fai_build3_core] Failed to open the file /galaxy/database/jobs_directory/022/22894/working/reference.fa
[E::refs_load_fai] Failed to open reference file '/galaxy/database/jobs_directory/022/22894/working/reference.fa'
[W::cram_get_ref] Failed to populate reference for id 0
[E::cram_decode_slice] Unable to fetch reference #0:393134-445372

Copy link
Member

@mvdbeek mvdbeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you bump the version ? As you said, this is just as broken in anything but the most basic scenario where inputs are assumed to be available on a shared filesystem. As soon as there's any sort of staging involved this will break. Wasn't there a cram mode that didn't use reference-guided compression ?

Copy link
Contributor

@bernt-matthias bernt-matthias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change still does not convince me. If I got the discussion in #6523 right then you need to supply the reference in downstream tools anyway if the linked one is not available.

I think this (i.e. the reference not being available) will happen in many/most cases (would also be my interpretation of the comment of @mvdbeek). Thus IMO the more important part of the solution should be that we always allow to supply the reference in all downstream tools that process CRAM.

Wondering how much space is saved in this mode vs the no-ref mode.

To me all this makes CRAM rather user unfriendly and unusable in a HPC/cloud environment like Galaxy. But I might be completely wrong - as so often :)

@@ -142,7 +137,11 @@
-K $io_options.K
#end if
-t \${GALAXY_SLOTS:-4}
reference.fa
#if $reference_source.reference_source_selector == 'history':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code block should be a macro token.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants