-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Save result of pseudo-experiments/toys #18
Comments
Hi @marinang, Brian had the same usecase when working in the |
Of course a toy manager would work, as you say. You could have your class as a fixture with 'session? scope ... |
@eduardo-rodrigues the generations of toys (plus fit / plus scan) easily takes hour even for the simple example of a gaussian signal over an exponential background, so for sure tests cannot be ran in CI pipelines, the toys have to be read for the tests. The idea would be to let users generate toys outside of |
OK, then you have to store the data somewhere. Fair enough. You can still use a little fixture to make the data available in the tests, though … I realise that |
I am not sure whether it is useful of not to put the results of pseudo-experiments for a particular statistical model in |
That's a good point. Fair enough. |
I expect this file to be in fact quite small, with about 100-1000 numerical entries, do you agree? This would also allow to e.g. use yaml? And it seems related, just on a lower level, to #19. Does it make sense to mix the two |
@mayou36 from experience to get a sensible result 1000 toys is the bare minimum. For each toy you need to store several numbers (more if you do a 2D confidence interval) stated in the first message. For a 3 sigma evidence 1000 is the order of toys you want, but imagine for a 5 sigma discovery. For upper limits ~ 1000 toys is fine but you have to multiply this by the number of values you scan for the POI (usually >= 10). So I don’t think it is that small. For that regard I am a bit hesitant to store that in the same yml file where the stat results are stored. |
Yes, true, that seems like too much, at least in human-readable format. yaml offers also the storage of arrays and there are some formats that combine yaml with hdf5 such as ASDF which I'm often looking at also in regard of likelihood storage. But pure HDF5 seems also fine, I agree |
There is still one issue, when the toys are stored there is no information stored about what model was used to generate the toys. So the user would have to be careful to match the toys with the corresponding loss in the |
That would be great of course! It is though only a thing wrapper around For the moment being, we may can leave the coordination to the user and let him name things properly? |
Or better @classmethod
def from_yaml(cls, yaml_file, fitting_backend):
loss = fitting_backend.from_yaml(yaml_file)
toys = function_extracting_toys_from_yaml(yaml_file)
return cls(loss, toys)
Yes for the moment we can only do that. |
Seems that at least for a first implementation the proposal to use hdf5 is viable. I would start an implementation with that. The point you made in connection with likelihood storage and fitting tools in general is not to be undermined, see my comment in #19. 👍 on your proposal. |
In #14 the frequentist calculator is introduced which uses toys to build test statistic distributions to compute pvalues for instance.
As the generation of toys + fitting + scanning is quite CPU intensive and takes time, an improvement would be to store the result of the pseudo experiments for instance in an hdf5 file.
This is currently an issue for testing which have time limits. For each pseudo experiment what need to be stored is:
The results stored can be reused in
hepstats.hypotests
with the frequentist calculator without regenerating pseudo-experiments. This would let the possibility to the users to generate pseudo-experiments outside ofheptstats
as well.A first design I have in mind is to create a class called
ToyManager
for instance which collect and save the results of the pseudo-experiments and can reopen them.What do you think @eduardo-rodrigues @mayou36 @HDembinski ?
The text was updated successfully, but these errors were encountered: