Can we sample genomes at a given time? #2513
Replies: 2 comments
-
You can use the split_edges function to add in new "sample" nodes at the given time: import msprime
import tskit
import numpy as np
ts = msprime.sim_ancestry(3, sequence_length=10, recombination_rate=0.1, random_seed=2)
ts = msprime.sim_mutations(ts, rate=0.1, random_seed=23)
print(ts.draw_text())
split_t = 1
# Mark the new nodes as samples so we can get their sequences
ts_split = ts.split_edges(time=split_t, flags=tskit.NODE_IS_SAMPLE)
print(ts_split.draw_text())
# Get sequences for newly added nodes
split_nodes = np.where(ts_split.nodes_time == split_t)[0]
for var in ts_split.variants(samples=split_nodes.astype(np.int32)):
print(var.genotypes) Giving:
It's very important to note that there's a lot of missing data here, since some nodes inserted on the branches will not be shared across trees. There's some nuance here about the information that was lost from the original ARG that msprime traversed when generating the fully-simplified trees here, but this is probably a good starting point. |
Beta Was this translation helpful? Give feedback.
-
If you are generating the tree sequences from simulations, then the best way to do this is via census events which will keep complete information. |
Beta Was this translation helpful? Give feedback.
-
Hi,
After build an ARG, we can get allele frequency at a given time (#386). Thus we can sample alleles for each site according to allele frequency.
Can we take this further to generate whole genomes from ARG at a given time (consider both allele frequencies and linkage disequilibrium)? Is there any possible ways?
Beta Was this translation helpful? Give feedback.
All reactions