write_vcf(): discrete_genome, 1-based coordinates, and contig length #1993
Unanswered
grahamgower
asked this question in
Q&A
Replies: 1 comment 2 replies
-
This is hairy stuff @grahamgower! I don't think we've thought deeply about 1-based coordinates, principally going under the assumption that if you're doing simulations it doesn't matter and if you're working with real data you've input the original coordinates, as is. A couple of notes:
|
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi tskitters,
My goal is to output a vcf using 1-based inclusive coords from a simulation with mutations at integral positions. By default,
write_vcf()
will output 0-based coords, and it is possible that a variant will be given POS 0. Thewrite_vcf()
docs suggest that if this behaviour is undesirable, the onus is on the user to transform coordinates by passing aposition_transform
function. (As an aside, I see the VCF4.2 spec says: "telomeres are indicated by using positions 0 or N+1, where N is the length of the corresponding chromosome or contig".)For an infinite sites simulation, I guess it makes sense to use
position_transform=np.ceil
to get 1-based inclusive coords. But this an identity transformation for finite sites simulations---a mutation at position 0 is still at position 0 after transformation. So I figured I'd write it asposition_transform=lambda x: 1 + np.floor(x)
. This gives the right coordinates (I think), but then I saw that the contig length is also being transformed by theposition_transform()
function! Is there a recommended incantation to get 1-based inclusive coords and not mung the contig length?Output:
Beta Was this translation helpful? Give feedback.
All reactions