Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handel or ignore non-standard dimensions #99

Open
timothymillar opened this issue Nov 13, 2024 · 3 comments
Open

Handel or ignore non-standard dimensions #99

timothymillar opened this issue Nov 13, 2024 · 3 comments

Comments

@timothymillar
Copy link

I'm trying vcztools on a zarr store that was not created with bio2zarr. This zarr includes an additional dimension 'nucleotides' which I would like to exclude from the output VCF but would rather nod delete. Currently vcztools view out.zarr results in the error:

Cannot determine VCF Number for dimension name 'nucleotides' in <zarr.core.Array '/call_ND' (2376, 2, 4) int32 read-only>

Ideally there would be an option to exclude some variables/dimensions from the output VCF.

@tomwhite
Copy link
Contributor

We could relax the logic in this case to use the size of the array dimension rather than raising an error here:

else:
raise ValueError(
f"Cannot determine VCF Number for dimension name '{last_dim}' in {a}"
)

Would that work?

This case also prompted me to look to see if there's a way of excluding fields in bcftools. The view command does not have an option to do this, but annotate does have an -x/--remove option.

So the general case for excluding fields could be covered with:

vcztools ... | bcftools annotate --remove INFO/DP > ...

(This wouldn't solve the problem in this issue as vcztools is failing.)

Of course, we could add -x/--remove to vcztools view, but that would be a departure from bcftools (which may be ok though).

@timothymillar
Copy link
Author

We could relax the logic in this case to use the size of the array dimension

That would work for my specific use-case. But I imagine it could result in some monstrous VCFs in other circumstances.

that would be a departure from bcftools

It is, but one of the key advantages of Zarr over VCF is its flexibility. If vcztools view must convert all of the data to VCF then it's essentially limiting the use of Zarr to the scope of the VCF spec. For me, one of the attractions of using Zarr is the ability to store related data which would not make sense within a VCF file. Although I am aware that this is bordering on scope-creep.

@jeromekelleher
Copy link
Contributor

I guess a sensible default here would be to warn if non-VCF compliant fields are present (rather than erroring) and provide an option to include these in the output in a best-effort manner?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants