Combining toys from several batch jobs #1948

madbaron · 2022-08-14T20:01:06Z

madbaron
Aug 14, 2022

Dear pyhf experts,

I am working on estimating the sensitivity of a counting experiment in a ~background free scenario as a function of the integrated luminosity with pyhf.

I want to do this with toys, since the asymptotics approximation fails badly in this scenario.
While the likelihood is trivial, I need to run a zillion toys to get good results.
Just in case you want to look at the code, it’s here: https://github.com/madbaron/HYPOmacros/blob/master/sensitivity_toys.py

Do you have any examples/recommendations for how to run these in separate batch jobs and combine them at the end to extract the p-values?

Thanks a lot in advance!
Federico

Answered by kratsg

Aug 26, 2022

This is a really good question and I've had a refresh of what we currently do in pyhf. For v0.6.3 of the code, you should be able to do something like this:

# all the same code from your example ...

from numpy import random

def job(calculator, seed=0):
    random.seed(seed)
    return calculator.distributions(0.0)

calculator = pyhf.infer.utils.create_calculator(
    calc_type, data, model, test_stat="q0", track_progress=False, **kwargs
)
test_stat = calculator.teststatistic(0.0)

from multiprocessing import Pool
with Pool(processes=4) as pool:
    res = pool.map(lambda x: job(calculator, seed=x), range(10))

from pyhf.infer.calculators import EmpiricalDistribution
sig_plus_bkg_dist = E…

View full answer

kratsg · 2022-08-26T05:11:46Z

kratsg
Aug 26, 2022
Maintainer

This is a really good question and I've had a refresh of what we currently do in pyhf. For v0.6.3 of the code, you should be able to do something like this:

# all the same code from your example ...

from numpy import random

def job(calculator, seed=0):
    random.seed(seed)
    return calculator.distributions(0.0)

calculator = pyhf.infer.utils.create_calculator(
    calc_type, data, model, test_stat="q0", track_progress=False, **kwargs
)
test_stat = calculator.teststatistic(0.0)

from multiprocessing import Pool
with Pool(processes=4) as pool:
    res = pool.map(lambda x: job(calculator, seed=x), range(10))

from pyhf.infer.calculators import EmpiricalDistribution
sig_plus_bkg_dist = EmpiricalDistribution([dist.samples for dist, _ in res])
bkg_dist = EmpiricalDistribution([dist.samples for _, dist in res])

pvalues = calculator.pvalues(test_stat, sig_plus_bkg_dist, bkg_dist)

The key point is for the numpy backend (see #1795) for setting the random seed to ensure that generating N toys when you resample from the pdf is "uniquely" and guaranteed random. This is of course going to be backend-dependent, so I'll leave that up to the reader to figure it out for the other backends.

The other note is that in the case of toy-based calculator, there's no stateful information from the toys actually stored on the calculator (just information about the model, data, bounds, etc...) so it's kosher in this case to pass in the merged distributions from the toys into the calculator here.

This won't necessarily be correct for the asymptotics case, but file a different discussion if you want that. We do need to improve the user experience here more. But this should be enough to get you started. Feel free to respond with a full working solution if you have one.

1 reply

madbaron Aug 29, 2022
Author

Works like a charm. Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Combining toys from several batch jobs #1948

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Combining toys from several batch jobs #1948

madbaron Aug 14, 2022

Replies: 1 comment · 1 reply

kratsg Aug 26, 2022 Maintainer

madbaron Aug 29, 2022 Author

madbaron
Aug 14, 2022

Replies: 1 comment 1 reply

kratsg
Aug 26, 2022
Maintainer

madbaron Aug 29, 2022
Author