Methodology for improving data analysis accuracy in the ATLAS experiment: a case study of WWZ Production in pp collisions
Method development for improving the accuracy of LHC data analysis using machine learning tools. In particular, this study explores the statistical impact and systematic uncertainties affecting the accuracy of data analysis.
Here WVZ analysis is replicated [1].
You must have access to the NR TSU server:
ssh -XY -p 10023 [email protected]
Create and enter the working directory:
mkdir myProject
cd myProject
source /share/shared_data/root/bin/thisroot.sh
root -x -b -q compact_teacher.cpp 2>&1 | tee BDT.log
root -x -b -q compact_writer.cpp 2>&1 |tee writer2.log
TRExFitter [2]:
First of all, we need to read the ntuples and turn them into histograms for further use within the framework. To do so, we make use of the n action (for example 3l2j region):
trex-fitter n clear_full_old.config "Regions=three_lep_presel_2jets" | tee trex_n.log
The first step after creating/reading the histograms is to produce a workspace containing our fit model:
trex-fitter wfs clear_full_old.config "Regions=three_lep_presel_2jets" | tee trex_w.log
w - create the RooStats xmls and workspace
f - fit the workspace
s - calculate significance
Next up, we are going to visualize the regions we want to fit. Run the d action next to produce pre-fit plots:
trex-fitter d compact.config "Regions=three_lep_presel_2jets" | tee trex_d.log
- the Plots/ folder contains plots showing data and MC per region you defined, as well as summary plots
- the Tables/ folder contains various tables in text or .tex format, showing you for example the yields per sample and per region
The plots produced include the effects from all systematics sources specified in the config in the bands drawn. As an example, here is the plot of the 3l2j region:
Time to see how our model describes data after the fit has been done. We use the p option to produce post-fit plots:
trex-fitter p compact.config "Regions=three_lep_presel_2jets" | tee trex_p.log
To see which nuisance parameter has the largest impact on the uncertainty of our signal strength, we make use of the r action (see the TRExFitter readme for more information on this [3]). For this tutorial, you can run them all at once:
trex-fitter r compact.config "Regions=three_lep_presel_2jets"
For each nuisance parameter, we perform four fits. The specific nuisance parameter is fixed to one of these configurations per fit:
pre-fit value + pre-fit uncertainty
pre-fit value - pre-fit uncertainty
post-fit value + post-fit uncertainty
post-fit value - post-fit uncertainty