Shortest subtree containing some number of samples from 2 different populations: cross-coalescent #1596

stsmall · 2021-07-28T19:01:09Z

stsmall
Jul 28, 2021

Hi All,
I was attempting to calculate the cross-coalescent as defined from Hejase et al. 2020 (bioRxiv link).

Defined in section: A Test for Elevation in Cross-Coalescence Time ... "For a given local tree and pair of species, we considered the 10 most recent cross coalescent events between the two species and normalized these ages, as in test 2, by the age of the youngest subtree that contains at least half of the total number of haploid samples."

The way I conceived of it was to calculate all the mrcas for each pair of samples from a tree and then sort and take the youngest 10. This is clearly very inefficient. Any ideas on a more efficient search?

my inefficient attempt ...


import networkx as nx

pop1_nodes = [0, 1, 2, 3]
pop2_nodes = [4, 5, 6, 7, 8]
sample_half = ts.num_samples / 2

for t in ts.trees():
    td = nx.DiGraph(t.as_dict_of_dicts())
    mrca = list(nx.all_pairs_lowest_common_ancestor(td, list(product(pop1_nodes, pop2_nodes))))
    cc =  [t.time(i[1]) for i in mrca]
    cc10 =  np.sort(cc)[:10]
    for n in t.nodes(order='timeasc'):
         if t.num_samples(n) > sample_half:
             cc_normalized = t.time(n)
             break

jeromekelleher · 2021-07-29T09:25:41Z

jeromekelleher
Jul 29, 2021
Maintainer

This is an interesting one @stsmall, and I guess it's probably a statistic we should add to the library. I don't have time to think this through properly, but I wonder if you'd considered using the num_tracked_samples as a way to possibly do this? e.g.

import msprime

ts = msprime.sim_ancestry(10, random_seed=234)

iter1 = ts.trees(tracked_samples=[0, 1, 2])
iter2 = ts.trees(tracked_samples=[3, 4, 5])

for tree1, tree2 in zip(iter1, iter2):
    for u in tree1.nodes(order='timeasc'):
        n1 = tree1.num_tracked_samples(u)
        n2 = tree2.num_tracked_samples(u)
        n = tree1.num_samples(u)
        # Reason about n1, n2 and n?
        print(u, n, n1, n2)

6 replies

jeromekelleher Jul 30, 2021
Maintainer

Great! Would you mind posting the code in here so that others can find it later?

stsmall Jul 30, 2021
Author

I will. Still testing.
Getting the nth cross-coalescent event is easy with tracked_samples, but getting the 1, 2, 3... to the nth is proving difficult since I have to avoid recounting the same event when the next node in order is immediately above. Lol ... maybe that makes sense? I am a bit confused but on the right track :)

jeromekelleher Jul 31, 2021
Maintainer

I'm not certain it's possible @stsmall, but thought it was worth thinking through anyway!

stsmall Aug 1, 2021
Author

Hi Jerome,
OK. I think I got it figured out. It tests OK at least. hejase_stats

jeromekelleher Aug 2, 2021
Maintainer

Looks great @stsmall! I think we should try to get this functionality into the library, as other people will be interested in using it. It would be great if you could lead the charge on this @stsmall - contributions => authorship on the forthcoming tskit paper. I'm happy to chat about the details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shortest subtree containing some number of samples from 2 different populations: cross-coalescent #1596

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 6 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Shortest subtree containing some number of samples from 2 different populations: cross-coalescent #1596

stsmall Jul 28, 2021

Replies: 1 comment · 6 replies

jeromekelleher Jul 29, 2021 Maintainer

jeromekelleher Jul 30, 2021 Maintainer

stsmall Jul 30, 2021 Author

jeromekelleher Jul 31, 2021 Maintainer

stsmall Aug 1, 2021 Author

jeromekelleher Aug 2, 2021 Maintainer

stsmall
Jul 28, 2021

Replies: 1 comment 6 replies

jeromekelleher
Jul 29, 2021
Maintainer

jeromekelleher Jul 30, 2021
Maintainer

stsmall Jul 30, 2021
Author

jeromekelleher Jul 31, 2021
Maintainer

stsmall Aug 1, 2021
Author

jeromekelleher Aug 2, 2021
Maintainer