-
Notifications
You must be signed in to change notification settings - Fork 1
3. Towards a top quark mass measurement
We will first get familiar with the format of the input files and one way to read and apply a selection to the events that it contains, in order to get the b-jet energy spectrum for top-pair events decaying in the eμ channel.
Let's inspect an input file from Python.
First, have a look at the list of input files:
ls /eos/user/c/cmsdas/2023/long-ex-top/
As you can see, for simulations, we have not only top-pair processes but also Drell-Yan, diboson...
Now, open one of the data file:
python
from ROOT import TFile
inF=TFile.Open("/eos/user/c/cmsdas/2023/long-ex-top//MuonEG_2016Hv2.root")
inF.ls()
and get the following output:
TFile** /eos/user/c/cmsdas/2023/long-ex-top//MuonEG_2016Hv2.root
TFile* /eos/user/c/cmsdas/2023/long-ex-top//MuonEG_2016Hv2.root
KEY: TTree data;1 data
The file contains only one tree, named data. The contents of this tree can be listed as follows:
t=inF.Get("data")
t.Print()
The output is long and skipped for brevity. The meaning of the variables is listed in the table below:
Variable | Comments |
---|---|
Run |
Run number |
Evt |
Event number |
LumiBlock |
Lumi section |
nPV |
Nmber if reconstructed vertices |
nPUtrue |
Number of pileup vertices generated |
PUWeights |
Pileup weights: nominal/down/up |
LepSelEffWeights |
Lepton selection scale factors: nominal/down/up |
TopPtWgt |
Top |
TrigWord |
Each bit is a trigger fired |
nLepton |
Number of isolated leptons |
Lepton_pt |
Isolated lepton |
... | ... |
MET_pt |
Missing transverse energy |
... | ... |
nJet |
Number of jets |
Jet_uncs |
Jet energy resolution and scale uncertainties |
Jet_pt |
Jet |
Jet_genpt |
Jet |
... | ... |
Jet_CombIVF |
CSV discriminator |
Jet_flavour |
Jet flavor at gen level |
A useful tool to have a quick glance at the contents of an ntuple is TTree::Scan, for example:
t.Scan("nPV:nJet")
to get:
************************************
* Row * nPV * nJet *
************************************
* 0 * 14 * 2 *
* 1 * 17 * 4 *
* 2 * 31 * 3 *
* 3 * 20 * 2 *
* 4 * 30 * 2 *
* 5 * 30 * 2 *
* 6 * 21 * 6 *
* 7 * 17 * 3 *
* 8 * 29 * 2 *
* 9 * 22 * 2 *
* 10 * 25 * 3 *
...
Type <CR> to continue or q to quit ==>
Question: What are the proper categories to analyze top-pair events ? Which conditions (cuts) would you apply to study dileptonic electron-muon candidate events alone? Which other process should be the main background ?
Let's start by plotting the transverse momentum of jets and b-tagged jets.
t.Draw("Jet_pt")
t.Draw("Jet_pt", "Jet_CombIVF>0.5426", "e1same")
Let's now turn to MC: it's interesting to take a look at the same quantities as above, as expected from simulation. For that purpose open a file with simulated top-pair events.
mcinF=TFile.Open("/eos/user/c/cmsdas/2023/long-ex-top/TTJets.root")
mct=mcinF.Get("data")
mct.Draw("Jet_genpt")
mct.Draw("Jet_genpt", "abs(Jet_flavour)==5", "e1same")
Question: In the familiarization folder, write a macro which automatically performs the previous distributions.
Checkpoint: Your macro may look like familiarization/controlPlots.py.
Let's plot the expected resolution for b-jet
mct.Draw("(Jet_pt-Jet_genpt)/Jet_genpt","abs(Jet_flavour)==5 && Jet_genpt > 0")
What about the isolated lepton identification ? Let's plot the
mct.Draw("Lepton_pt","abs(Lepton_id)==11 && Lepton_pt < 200")
mct.Draw("Lepton_pt","abs(Lepton_gid)==11 && Lepton_pt < 200", "e1same")
and muons
mct.Draw("Lepton_pt","abs(Lepton_id)==13 && Lepton_pt < 200")
mct.Draw("Lepton_pt","abs(Lepton_gid)==13 && Lepton_pt < 200", "e1same")
Question: What is the reason of the $p_T$ requirement in the pre-selection step?
You may have noticed that several variables are weights. In simulation, the events of interest are overlaid with a random number of minimum bias events in order to recreate a given pileup scenario. To match the simulated pileup scenario to the one observed in the data the events have to be re-weighted. Besides, b-tagging and trigger efficiencies are not exactly the same for simulation and data. This is taken into account through weights.
Now that you are familiar with the input files, let's apply a tighter event selection (such as requirement in the jet and btagged jet multiplicities, tighter cut on bjet tag etc.) and get the b-jet energy distribution. The operating point values can be obtained from here. A skeleton is available in the analyzeNplot folder. Let's open
runBJetEnergyPeak.py
The main()
is of course at the end but the interesting part goes actually from line 13 to 80. Histograms are firstly defined. Events are analyzed in a loop.
Question: What are the current selection criteria ?
To run this code:
python runBJetEnergyPeak.py -i /eos/user/c/cmsdas/2023/long-ex-top/ -j data/samples_Run2016_25ns.json -o nominal -n 2
The json file contains information for each file name: the cross-section is given, then 0 for MC and 1 for data, the name of the dataset, the name of the processes, and finally the color.
It takes several minutes to run over all the input files. Output files will be saved in the nominal subfolder.
In the meanwhile, you can start improving the python code.
Question: Which other distributions could be interesting ?
Please, complete the code where the histograms are defined
histos={
'bjeten':ROOT.TH1F('bjeten',';Energy [GeV]; Jets',30,0,300),
'bjetenls':ROOT.TH1F('bjetenls',';log(E); 1/E dN_{b jets}/dlog(E)',20,3.,7.,),
'nvtx' :ROOT.TH1F('nvtx',';Vertex multiplicity; Events',30,0,30),
'nbtags':ROOT.TH1F('nbtags',';b-tag multiplicity; Events',5,0,5)
}
and add the necessary lines to fill them. As you can notice, not only the b-jet energy distribution is computed ('bjeten'), but also the energy distribution on a logarithmic scale ('bjetenls'). As a matter of fact, it is easier to find the peak of the
Checkpoint: Your macro may look like analyzeNplot/answers/runBJetEnergyPeak.py.
To see the data to simulation agreement, you can get plots by running
python plotter.py -i nominal -j data/samples_Run2016_25ns.json -l 35867.
Under nominal/plots
, you'll find a file called plotter.root
, containing the histograms with the distributions normalized by integrated luminosity (png
and pdf
versions of the plots.
The number of events for each process can be obtain as follow
python getNumberOfEvents.py -i nominal -o table -j data/samples_Run2016_25ns.json
The output is stored both in the pdf
and LaTeX
format, in the table
subfolder.
Once you have a first set of histograms, think twice about the selection criteria, and adapt the code.
Next step is to fit the b-jet energy (bjetenls
) peak. A skeleton is available in the fitNcalibrate
folder. Let's open
fitPeak.py
You can run this script either
python fitPeak.py -i "../analyzeNplot/nominal" -j "../analyzeNplot/data/samples_Run2016_25ns.json" -l 35867.
for MC (signal+background, normalized at their cross-sections, are fitted together) or
python fitPeak.py -i "../analyzeNplot/nominal" -l 35867. -d
for data. The argument following -i is the folder in which the plotter.root file has been previously produced. As previously, -l
precedes the luminosity and -j
the path for the json
file. In this example of usage, the output is stored under nominal/fit_MC.pdf
for MC and nominal/fit_Data.pdf
for data.
Let's dig into the code. The main()
function is, as usual, at the end, line 172, but it is not the most interesting part. Let's go to line 8. As you can see, the gPeak
function performs a fit of the b-jet energy spectrum in logarithmic scale, using a gaussian PDF defined lines 5 and 6, between 3.6 and 4.8 GeV. The parameter you are interested in is the mean
Play a little bit with the mean and width limit values
## Set normalization
fitfunc.SetParameter(0, h.Integral());
fitfunc.SetParLimits(0, 0.1*h.Integral(), 2.5*h.Integral());
## Set gaussian mean starting value and limits
fitfunc.SetParameter(1, 4.2);
fitfunc.SetParLimits(1, 4., 4.4);
## Set gaussian width starting value and limits
fitfunc.SetParameter(2, 0.65);
fitfunc.SetParLimits(2, 0.35, 0.95);
the range fit
## Set limits
minToFit = 3.6
maxToFit = 4.8
and, as there is no physical motivation behind it, even the function
def myFitFunc(x=None,par=None):
return par[0]*TMath.Gaus(x[0],par[1],par[2],kFALSE)
(you could try a polynomial function ?).