Analyzer for CMS Open Data Pixel Seeds ML Applications http://opendata.cern.ch/
Data producer for developing machine learning algorithms to select and filter pixel doublet seeds for tracking applications at CMS experiments.
The first step is the creation of a CMSSW_10_2_5
release workarea
cmsrel CMSSW_10_2_5
cd CMSSW_10_2_5/src/
git clone [email protected]:cms-legacydata-analyses/CNNPixelSeedsProducerTool.git .
scram b -j 2
cmsenv
Once the compilation is completed you are ready to produce the pixel doublets seeds datasets:
cmsrel CMSSW_10_2_5
cd CMSSW_10_2_5/src/CNNFiltering/CNNAnalyze/test/
cmsRun step3_ML_trackingOnly.py
This configuration will run the full CMS track reconstruction on simulated () events and will produce in CMSSW_10_3_5/src/CNNFiltering/CNNAnalyze/test/doublets/
directory (automatically created) a set of text (txt
) files containing the doublets produced in each of the pixel seed based iterative tracking steps (red boxed in the picture below). For further details about track reconstruction at CMS and iterative tracking see [1],[2],[3]
Analogously
cmsrel CMSSW_10_2_5
cd CMSSW_10_2_5/src/CNNFiltering/CNNAnalyze/test/
cmsRun step3_ML_pixelOnly.py
will produce the doublets seeds used aas starting blocks for pixel-only tracks reconstruction. The text files generated are named with the following rules:
_l_r_e_step_dnn_doublets.txt
with
- l = lumisection number
- r = run number
- e = event number
- step = iterative tracking step name (e.g
pixelTracksHitDoublets
for pixel tracks step)
Both the configuration files (step3_ML_pixelOnly.py
and step3_ML_trackingOnly.py
) may receive in input few parameters:
Name | Type | Default | Description |
---|---|---|---|
pileUp | int | 50 | Average number of simultaneous collisions per event (for this use case should be kept to 50). |
skipEvent | int | 0 | Number of events to be skipped. |
numEvents | int | 100 | Total number of events to be processed (after skipping). |
numFile | int | 0 | The index, in the list provided, of the file to be processed. |
openDataVM | bool (True or False) | True | Flag to signal if you are working on an Open Data WM or somewhere else. |
|
Any of these inputs should be parsed as follows:
cmsRun step3_ML_trackingOnly.py inputName=VALUE
In order to convert the txt datasets to hdf table formats simply run (in CMSSW_10_3_5/src/CNNFiltering/CNNAnalyze/test/
)
python toHdf.py
this will automatically read the content of doublets
directory and produce two hdf files:
- in
doublets/original/
the plain hdf converted file; - in
doublets/bal_data/
a new balanced hdf table where the yields of fake and true seeds have been forced to be equal, by sampling the more populated of the two classes;
The dataset created above consists of a collection of pixel doublet seeds that would be used by CMS track reconstruction workflow. Each doublet is characterised by a list of features:
Event Info | |
run | Run number |
evt | Event number |
lumi | Lumisection number |
PU | Number of primary vertices in the event |
bSX, bSY, bSZ | Beam spot coordinates (x,y,z) |
Features | (“in” or “out” prefix to indicate the inner or the outer hit of the doublet, e.g. inDetSeq, outX . . .) |
DetSeq | Sequential number for the inner hit and outer hit layer. For the silicon pixel detectors these numbers may be {0,1,2,3} for the four pixel barrel layers {14,15,16} for the three negative encap and {29,30,31} for the three positive endcap layers. |
X, Y, Z, R | Doublet inner [outer] hit spatial coordinates. |
Phi | Doublet inner [outer] hit azimuthal angle \phi. |
R | Doublet inner [outer] hit radial (r=\sqrt{x^2 + y^2}) direction. |
IsBarrel | Flag for inner [outer] hit being on a barrel layer |
Layer, Ladder, Side, Disk, Panel, Module | Inner [outer] hit detector specifics. For the barrel detector hit two numbers are meaningful: the layer number indicates on which cylindrical layer the hit lies; the ladder number |
IsFlipped | Flag indicating if the module is flipped with respect to the standard outward orientation. |
Ax1 | Length of the vector connecting the the origin to the local module coordinate reference system origin (0,0,0) for the inner [outer] hit. |
Ax2 | Length of the vector connecting the the origin to the point (0,0,1) in the local module coordinate reference system for the inner [outer] hit. |
ClustX, ClustY | Pixel cluster local, i.e. in the local module layer system of reference, coordinates for the inner [outer] hit. |
OverFlowX, OverFlowY, | Flags indicating if the the pixel cluster for the inner [outer] hit spans over the pad size (16) along the X or Y local detector module axes. |
ClustSize, ClustSizeX, ClustSizeY | Inner [outer] pixel cluster absolute size, i.e. number of pixel composing it, and sizes along X and Y local detector module axes. |
SumADC | Sum of the A.D.C. levels of all the pixels composing the cluster. |
IsBig | Flag indicating that the inner [outer] hits spans two (or more) ROCs modules. |
IsBad | Flag indicating that at least one pixel composing the inner [outer] hit is marked as malfunctioning. |
IsEdge | Flag indicating that the inner [outer] hit is on the edge of a ROC module. |
PixelZero | Highest equivalent released charge (in A.D.C. levels) for a single pixel belonging to the inner [outer] hit pixel cluster. |
AvgCharge | Average charge released on each pixel forming the inner [outer] pixel cluster. |
Skew | Ratio between the inner [outer] pixel cluster Y size and X size. |
Pixel Pads | (“in” or “out” prefix to indicate the inner or the outer hit of the doublet, e.g. inDetSeq, outX . . .) |
PixX
with X = 0,...,255 |
Inner [outer] hit pixels A.D.C. levels with X ranging from 0 to 255 for a 16x16 pad). The X index spans from top left pad corner to bottom right: e.g. the last bottom row will span from inPix240 to inPix255. |
Labels | (if the hit is not matched to any tracking particle all these labels are set to -1.0. “in” or “out” prefix to indicate the inner or the outer hit of the doublet, e.g. inDetSeq, outX . . .) |
PId | Flag set to 1.0 (-1.0) if the inner [outer] hit is (not) matched |
TId | Inner [outer] hit matched tracking particle key number in the event collection of tracking particles. |
Px,Py,Pz,Pt | Inner [outer] hit matched tracking particle momentum components (p_x, p_y, p_z) and transverse momentum (p_T). |
MT | Inner [outer] hit matched tracking particle transverse mass. |
ET | Inner [outer] hit matched tracking particle transverse energy. |
MSqr | Inner [outer] hit matched tracking particle mass squared. |
PdgId | Inner [outer] hit matched tracking particle PDG id, i.e. the index indicating which kind of particle it is. |
Charge | Inner [outer] hit matched tracking particle charge. |
NTrackerHits | Inner [outer] hit matched tracking particle number of tracker hits. |
NTrackerLayers | The number of tracker layers crossed by the inner [outer] hit matched tracking particle. |
Phi
Eta Rapidity |
Inner [outer] hit matched tracking particle phi, eta and y. |
VX, VY, VZ | Inner [outer] hit matched tracking particle vertex global coordinates. |
DXY | Inner [outer] hit matched tracking particle vertex transverse impact parameter. |
DZ | Inner [outer] hit matched tracking particle vertex longitudinal impact parameter. |
BunchCrossing | Event bunch crossing number. |
The notebook in CNNPixelSeedsProducerTool/notebooks/cnn_filtering.ipynb
is a good starting point to explore and understand the datset features.
[1] https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideIterativeTracking