Statistical uncertainties #87

HealthyPear · 2020-10-01T13:55:43Z

Context

This issue will focus on the management of statistical uncertainties.
We can use this issue to list/propose multiple methods other than the one described down here.

If you are instead interested in implementing systematic uncertainties, please refer to issue #88.

Current work

For the moment the method that we decided to start implementing corresponds to the Bootstrapping method.

In particular it is planned to re-sample DL2 data in a cycle of N boostrapping calls where each call will execute the production script in use with the resampled data.
In-memory storage of the calculated IRFs and sensitivity curves will allow to calculate in the end median and standard deviation for each energy-bin, resulting in the final output with statistical uncertainties.

Methods discussed

This list can be expanded modified.

Bootstrap

vuillaut · 2020-10-02T07:49:58Z

For the resolutions computed as x/100 of the containment radius, we can compute the confidence interval analytically.

maxnoe · 2020-10-02T07:52:31Z

Same for effective area, the detected number of events follows a binomial distribution and we can use https://docs.astropy.org/en/stable/api/astropy.stats.binom_conf_interval.html

Like here: https://github.com/fact-project/irf/blob/d82a3d4ae8b9ef15d9f473cdcd01a5f9c92d42a2/irf/collection_area.py#L84-L88

maxnoe · 2020-10-02T07:57:02Z

The problem is more to include the uncertainties created by the cut optimization into the resulting IRFs, as there is no way to trace these uncertainties, so the only solution is repeating it on different samples (bootstrapping, more simulations, cross-validation...)

AMWMitchell · 2021-03-23T10:31:54Z

Just to comment we need the error on the effective area to test the tolerance.
Please implement the error on effective area using binomial confidence intervals.

chaimain · 2021-10-07T08:09:39Z

Can we revive this issue, and try to implement a solution? At LST we have point gamma MC files with 7 different pointing directions at the same wobble offset and we can use them to test the solution as well.

We are also looking at generating MC/IRF grids for interpolation (and extrapolation), so it would be very helpful to understand the statistical uncertainties involved with the IRFs

HealthyPear · 2021-10-07T08:37:23Z

I think the best course of action for the moment would be to do this via bootstrapping.
From the point of view of pyirf I think the only thing to add is the support for the error quantities in the DL3 files as the actual operation (at least in this specific case) is not something the library would do, but rather the (ctapipe) tool or the (protopipe/lstchain) script would do: sampling the DL2 data, call the DL2-to-DL3 step on the sample and read a (N_quantities, N_samples) array on which to calculate means and STDs to write in a final DL3 file.

There was a Ph.D. student testing this for protopipe (in fact I have the same issue there), perhaps I can ask here if she has a template to share.

moralejo · 2021-10-07T09:11:17Z

That seems overcomplicated to me. For effective areas at least simple binomial uncertainties should do. Also I wonder if we need more than that, because the goal is simply to check if a given MC library, with a certain set of cuts, provides sufficiently "precise" (stats-wise) IRFs. If that is not the case we should simply produce more MC, until those uncertainties are negligible (and fulfill the CTA requirements). In that sense, I don't see an urgent need either in having the error quantities available in the standard DL3 file.

kosack · 2021-10-11T15:10:13Z

I guess right now the uncertainties are only meant for testing if we are within requirements or need more stats. But, they could also be used by the science tools as weights in the likelihood fit, or to determine what region of phase space is "usable" for analysis. In any case, it doesn't hurt to store them in the IRF, even if they are not used for anything other than validation.

This was referenced Oct 1, 2020

Systematic uncertainties #88

Open

Uncertainties #40

Open

HealthyPear added discussion Issues that open a discussion without any planned modification to the code enhancement New feature or request labels Oct 1, 2020

HealthyPear assigned gaia-verna Oct 1, 2020

HealthyPear mentioned this issue Dec 7, 2020

Reproduce current on-axis EventDisplay IRFs #7

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Statistical uncertainties #87

Statistical uncertainties #87

HealthyPear commented Oct 1, 2020 •

edited

Loading

vuillaut commented Oct 2, 2020

maxnoe commented Oct 2, 2020 •

edited

Loading

maxnoe commented Oct 2, 2020

AMWMitchell commented Mar 23, 2021

chaimain commented Oct 7, 2021

HealthyPear commented Oct 7, 2021 •

edited

Loading

moralejo commented Oct 7, 2021

kosack commented Oct 11, 2021

Statistical uncertainties #87

Statistical uncertainties #87

Comments

HealthyPear commented Oct 1, 2020 • edited Loading

vuillaut commented Oct 2, 2020

maxnoe commented Oct 2, 2020 • edited Loading

maxnoe commented Oct 2, 2020

AMWMitchell commented Mar 23, 2021

chaimain commented Oct 7, 2021

HealthyPear commented Oct 7, 2021 • edited Loading

moralejo commented Oct 7, 2021

kosack commented Oct 11, 2021

HealthyPear commented Oct 1, 2020 •

edited

Loading

maxnoe commented Oct 2, 2020 •

edited

Loading

HealthyPear commented Oct 7, 2021 •

edited

Loading