Traversing the ARG while identifying children #2469
hyanwong
started this conversation in
Show and tell
Replies: 1 comment
-
An alternative that would work and could return all the nodes in the TS could be a function that (efficiently) returned all the children (anywhere in the tree sequence) of a node. I expect this would be most efficiently implemented as an iterator: import tskit
import numpy as np
def nodes_with_children(self):
# inefficient python implementation, but could be sped up with numba
edge_id = next_edge_id = 0
edges_parent = self.edges_parent
edges_child= self.edges_child
empty_array = np.array([], dtype=edges_parent.dtype)
parent_nodes = set(edges_parent)
for n in ts.nodes(order="timeasc"):
if n.id in parent_nodes:
assert edges_parent[edge_id] == n.id
while next_edge_id < self.num_edges and edges_parent[next_edge_id] == n.id:
next_edge_id += 1
yield edges_parent[edge_id], np.unique(edges_child[edge_id: next_edge_id])
edge_id = next_edge_id
else:
yield n.id, empty_array
ts = tskit.Tree.generate_balanced(10).tree_sequence
for parent, children in nodes_with_children(ts):
print("node:", parent, "- children:", children)
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
If we want
tskit
to be treated as an ARG library, it would be useful to be able to do things while traversing over the graph nodes. I suspect a common thing to want to do is to identify the children of each node (I have just encountered this when implementing a "constrain dates" function in tsdate, to ensure all parent nodes are older than any of their children).The simplest way I can think to do this is to take advantage of the fact that edges in a tree sequence are listed in the order of time of the parent node (and edges for a give parent are guaranteed to be adjacent). So if we iterate over edges, grouping by the parent field, we can get this information out quite efficiently. This is how I'm doing it at the moment
It is worth having this recipe noted somewhere (hence this discussion). But I'm also wondering if it would be useful to have the
new_parent_edge_idx
array, which returns the edge ids corresponding to a change in parent (the array is guaranteed to be of a length less than or equal to the number of nodes in the ts) available as a built-in function to tskit? It's not necessarily an obvious thing to construct by hand, and maybe it would also be useful in C?Note also that the code above will not visit all the nodes in the tree sequence, only those which have children.
Beta Was this translation helpful? Give feedback.
All reactions