Skip to content

3. Towards a top quark mass measurement

Sebastian Wuchterl edited this page Jun 7, 2023 · 1 revision

We will first get familiar with the format of the input files and one way to read and apply a selection to the events that it contains, in order to get the b-jet energy spectrum for top-pair events decaying in the eμ channel.

Analyzing the inputs

Let's inspect an input file from Python. First, have a look at the list of input files: ls /eos/user/c/cmsdas/2023/long-ex-top/

As you can see, for simulations, we have not only top-pair processes but also Drell-Yan, diboson...

Now, open one of the data file:

python
from ROOT import TFile
inF=TFile.Open("/eos/user/c/cmsdas/2023/long-ex-top//MuonEG_2016Hv2.root")
inF.ls()

and get the following output:

TFile**		/eos/user/c/cmsdas/2023/long-ex-top//MuonEG_2016Hv2.root
 TFile*		/eos/user/c/cmsdas/2023/long-ex-top//MuonEG_2016Hv2.root
  KEY: TTree	data;1	data

The file contains only one tree, named data. The contents of this tree can be listed as follows:

t=inF.Get("data")
t.Print()

The output is long and skipped for brevity. The meaning of the variables is listed in the table below:

Variable Comments
Run Run number
Evt Event number
LumiBlock Lumi section
nPV Nmber if reconstructed vertices
nPUtrue Number of pileup vertices generated
PUWeights Pileup weights: nominal/down/up
LepSelEffWeights Lepton selection scale factors: nominal/down/up
TopPtWgt Top $p_\mathrm{T}$ weight
TrigWord Each bit is a trigger fired
nLepton Number of isolated leptons
Lepton_pt Isolated lepton $p_\mathrm{T}$
... ...
MET_pt Missing transverse energy $p_\mathrm{T}$
... ...
nJet Number of jets
Jet_uncs Jet energy resolution and scale uncertainties
Jet_pt Jet $p_\mathrm{T}$
Jet_genpt Jet $p_\mathrm{T}$ at generator level
... ...
Jet_CombIVF CSV discriminator
Jet_flavour Jet flavor at gen level

A useful tool to have a quick glance at the contents of an ntuple is TTree::Scan, for example:

t.Scan("nPV:nJet")

to get:

************************************
*    Row   *       nPV *      nJet *
************************************
*        0 *        14 *         2 *
*        1 *        17 *         4 *
*        2 *        31 *         3 *
*        3 *        20 *         2 *
*        4 *        30 *         2 *
*        5 *        30 *         2 *
*        6 *        21 *         6 *
*        7 *        17 *         3 *
*        8 *        29 *         2 *
*        9 *        22 *         2 *
*       10 *        25 *         3 *
...
Type <CR> to continue or q to quit ==>

Question: What are the proper categories to analyze top-pair events ? Which conditions (cuts) would you apply to study dileptonic electron-muon candidate events alone? Which other process should be the main background ?

Let's start by plotting the transverse momentum of jets and b-tagged jets.

t.Draw("Jet_pt")
t.Draw("Jet_pt", "Jet_CombIVF>0.5426", "e1same")

Let's now turn to MC: it's interesting to take a look at the same quantities as above, as expected from simulation. For that purpose open a file with simulated top-pair events.

mcinF=TFile.Open("/eos/user/c/cmsdas/2023/long-ex-top/TTJets.root")
mct=mcinF.Get("data")
mct.Draw("Jet_genpt")
mct.Draw("Jet_genpt", "abs(Jet_flavour)==5", "e1same")

Question: In the familiarization folder, write a macro which automatically performs the previous distributions.

Checkpoint: Your macro may look like familiarization/controlPlots.py.

Let's plot the expected resolution for b-jet $p_\mathrm{T}$. We define resolution as $\frac{p_\mathrm{T}^\mathrm{rec}-p_\mathrm{T}^\mathrm{gen}}{p_\mathrm{T}^\mathrm{gen}}$ where $p_\mathrm{T}^\mathrm{rec}$ ($p_\mathrm{T}^\mathrm{gen}$) is the reconstructed (generated) transverse momentum of a jet.

mct.Draw("(Jet_pt-Jet_genpt)/Jet_genpt","abs(Jet_flavour)==5 && Jet_genpt > 0")

What about the isolated lepton identification ? Let's plot the $p_\mathrm{T}$ of particle reconstructed as electrons and genuine electrons

mct.Draw("Lepton_pt","abs(Lepton_id)==11 && Lepton_pt < 200")
mct.Draw("Lepton_pt","abs(Lepton_gid)==11 && Lepton_pt < 200", "e1same")

and muons

mct.Draw("Lepton_pt","abs(Lepton_id)==13 && Lepton_pt < 200")
mct.Draw("Lepton_pt","abs(Lepton_gid)==13 && Lepton_pt < 200", "e1same")

Question: What is the reason of the $p_T$ requirement in the pre-selection step?

You may have noticed that several variables are weights. In simulation, the events of interest are overlaid with a random number of minimum bias events in order to recreate a given pileup scenario. To match the simulated pileup scenario to the one observed in the data the events have to be re-weighted. Besides, b-tagging and trigger efficiencies are not exactly the same for simulation and data. This is taken into account through weights.

Event selection

Now that you are familiar with the input files, let's apply a tighter event selection (such as requirement in the jet and btagged jet multiplicities, tighter cut on bjet tag etc.) and get the b-jet energy distribution. The operating point values can be obtained from here. A skeleton is available in the analyzeNplot folder. Let's open

runBJetEnergyPeak.py

The main() is of course at the end but the interesting part goes actually from line 13 to 80. Histograms are firstly defined. Events are analyzed in a loop.

Question: What are the current selection criteria ?

To run this code:

python runBJetEnergyPeak.py -i /eos/user/c/cmsdas/2023/long-ex-top/ -j data/samples_Run2016_25ns.json -o nominal -n 2

The json file contains information for each file name: the cross-section is given, then 0 for MC and 1 for data, the name of the dataset, the name of the processes, and finally the color.

It takes several minutes to run over all the input files. Output files will be saved in the nominal subfolder.

In the meanwhile, you can start improving the python code.

Question: Which other distributions could be interesting ?

Please, complete the code where the histograms are defined

    histos={ 
        'bjeten':ROOT.TH1F('bjeten',';Energy [GeV]; Jets',30,0,300),
        'bjetenls':ROOT.TH1F('bjetenls',';log(E); 1/E dN_{b jets}/dlog(E)',20,3.,7.,), 
        'nvtx'  :ROOT.TH1F('nvtx',';Vertex multiplicity; Events',30,0,30),
        'nbtags':ROOT.TH1F('nbtags',';b-tag multiplicity; Events',5,0,5)
        }

and add the necessary lines to fill them. As you can notice, not only the b-jet energy distribution is computed ('bjeten'), but also the energy distribution on a logarithmic scale ('bjetenls'). As a matter of fact, it is easier to find the peak of the $\log(E_\mathrm{b})$ distribution rather than of the $E_\mathrm{b}$ spectrum, as the $\log(E_\mathrm{b})$ distribution is symmetric around its peak. In order to not distort the shape, the $\log(E_\mathrm{b})$ distribution need to be reweighed by $1/E$

$\frac{\mathrm{d}N}{\mathrm{d}E_\mathrm{b}}=\frac{1}{E_\mathrm{b}}\frac{\mathrm{d}N}{\mathrm{d}\log E_\mathrm{b}}$

Checkpoint: Your macro may look like analyzeNplot/answers/runBJetEnergyPeak.py.

To see the data to simulation agreement, you can get plots by running

python plotter.py -i nominal -j data/samples_Run2016_25ns.json  -l 35867.

Under nominal/plots, you'll find a file called plotter.root, containing the histograms with the distributions normalized by integrated luminosity ($35867 \mathrm{pb}^{-1}$) together with png and pdf versions of the plots.

The number of events for each process can be obtain as follow

python getNumberOfEvents.py -i nominal -o table -j data/samples_Run2016_25ns.json

The output is stored both in the pdf and LaTeX format, in the table subfolder.

Once you have a first set of histograms, think twice about the selection criteria, and adapt the code.

Fitting the b-jet energy spectrum

Next step is to fit the b-jet energy (bjetenls) peak. A skeleton is available in the fitNcalibrate folder. Let's open

fitPeak.py

You can run this script either

python fitPeak.py -i "../analyzeNplot/nominal" -j "../analyzeNplot/data/samples_Run2016_25ns.json" -l 35867.

for MC (signal+background, normalized at their cross-sections, are fitted together) or

python fitPeak.py -i "../analyzeNplot/nominal" -l 35867. -d

for data. The argument following -i is the folder in which the plotter.root file has been previously produced. As previously, -l precedes the luminosity and -j the path for the json file. In this example of usage, the output is stored under nominal/fit_MC.pdf for MC and nominal/fit_Data.pdf for data.

Let's dig into the code. The main() function is, as usual, at the end, line 172, but it is not the most interesting part. Let's go to line 8. As you can see, the gPeak function performs a fit of the b-jet energy spectrum in logarithmic scale, using a gaussian PDF defined lines 5 and 6, between 3.6 and 4.8 GeV. The parameter you are interested in is the mean $\mu$ of the gaussian. The b-jet energy peak is indeed $\langle E_{b} \rangle = \exp{(\mu)} \pm \Delta\mu\cdot\exp{(\mu)} $

Play a little bit with the mean and width limit values

## Set normalization
fitfunc.SetParameter(0, h.Integral());
fitfunc.SetParLimits(0, 0.1*h.Integral(), 2.5*h.Integral());
## Set gaussian mean starting value and limits
fitfunc.SetParameter(1, 4.2);
fitfunc.SetParLimits(1, 4., 4.4);
## Set gaussian width starting value and limits
fitfunc.SetParameter(2, 0.65);
fitfunc.SetParLimits(2, 0.35, 0.95);

the range fit

## Set limits
minToFit = 3.6
maxToFit = 4.8

and, as there is no physical motivation behind it, even the function

def myFitFunc(x=None,par=None):
    return par[0]*TMath.Gaus(x[0],par[1],par[2],kFALSE)

(you could try a polynomial function ?).