Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Neural network implementation for gpu clusterization #13610

Open
wants to merge 35 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
d4dc46e
Copying kernels to implement NN clusterizer
ChSonnabend May 16, 2024
c191885
Merge branch 'dev' into gpu_clusterizer
ChSonnabend May 24, 2024
05831ef
First version of clusterizer in GPU code
ChSonnabend May 27, 2024
8515290
Merge branch 'gpu_clusterizer' of github.com:ChSonnabend/AliceO2 into…
ChSonnabend May 27, 2024
3f6c934
Adding a compiling and running version with single-threaded ONNX mode…
ChSonnabend May 29, 2024
8ba6805
Clusters now working by a hack
ChSonnabend May 29, 2024
6ec3c46
Working implementation of settings via GPUSettings.h and --configKeyV…
ChSonnabend Jun 6, 2024
626a46f
Merge branch 'AliceO2Group:dev' into gpu_clusterizer
ChSonnabend Jun 24, 2024
ab4653a
Modifying the onnx_interface to include the right headers
ChSonnabend Jun 24, 2024
04084c8
Adjusting initialization for new ONNXRuntime version
ChSonnabend Jun 24, 2024
01dc4a1
Adjusting global settings and CF code for several settings
ChSonnabend Jun 26, 2024
accd7ab
Adding return statement if cluster is rejected
ChSonnabend Jul 3, 2024
019b388
Merge branch 'AliceO2Group:dev' into gpu_clusterizer
ChSonnabend Jul 3, 2024
3473a06
Adding some statements back
ChSonnabend Jul 4, 2024
dfffdf5
Merge branch 'dev' into gpu_clusterizer
ChSonnabend Oct 16, 2024
df21c96
Update to latest status of gpu clusterization
ChSonnabend Oct 17, 2024
06737fd
Fixing uchar -> uint8_t
ChSonnabend Oct 18, 2024
b148449
Adding utils header
ChSonnabend Oct 18, 2024
534da50
Updating kernels.cmake to uint8_t
ChSonnabend Oct 21, 2024
bb2cb6e
Please consider the following formatting changes
alibuild Oct 21, 2024
027e225
Merge pull request #6 from alibuild/alibot-cleanup-13610
ChSonnabend Nov 4, 2024
25093b3
Adding an ONNX CPU library in the O2 framework
ChSonnabend Nov 18, 2024
74cf0e7
Merge branch 'AliceO2Group:dev' into onnxruntime-cpu
ChSonnabend Nov 18, 2024
9232328
Please consider the following formatting changes
alibuild Nov 18, 2024
9a6a9e8
Merge pull request #7 from alibuild/alibot-cleanup-13709
ChSonnabend Nov 18, 2024
7251c5c
Fixing macOS build issues with calling O*.data()
ChSonnabend Nov 19, 2024
d0f4dd8
Fixing compiler issues and char -> uint8_t
ChSonnabend Nov 19, 2024
7859ab2
Fixing curly braces
ChSonnabend Nov 19, 2024
c6cb3e6
Fixing std::make_shared
ChSonnabend Nov 19, 2024
55621f0
Merge branch 'onnxruntime-cpu' into gpu_clusterizer
ChSonnabend Nov 20, 2024
a00a54b
Merge branch 'dev' into gpu_clusterizer
ChSonnabend Nov 20, 2024
40bc437
Changing order for <CommonUtils/StringUtils.h>
ChSonnabend Nov 20, 2024
f0a8cc2
Merge branch 'dev' into gpu_clusterizer
ChSonnabend Nov 22, 2024
d3aede4
Merge branch 'dev' into gpu_clusterizer
ChSonnabend Dec 17, 2024
52b033f
Bug-fixing file name
ChSonnabend Dec 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions GPU/GPUTracking/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -197,6 +197,7 @@ if(ALIGPU_BUILD_TYPE STREQUAL "O2" OR GPUCA_CONFIG_O2_EXTENSIONS)
TPCClusterFinder/GPUTPCCFChargeMapFiller.cxx
TPCClusterFinder/GPUTPCCFPeakFinder.cxx
TPCClusterFinder/GPUTPCCFNoiseSuppression.cxx
TPCClusterFinder/GPUTPCNNClusterizer.cxx
TPCClusterFinder/GPUTPCCFClusterizer.cxx
TPCClusterFinder/GPUTPCCFDeconvolution.cxx
TPCClusterFinder/GPUTPCCFMCLabelFlattener.cxx
Expand Down Expand Up @@ -307,6 +308,7 @@ if(ALIGPU_BUILD_TYPE STREQUAL "O2")
O2::GPUCommon
O2::ReconstructionDataFormats
O2::TPCFastTransformation
O2::ML
PRIVATE_LINK_LIBRARIES O2::DataFormatsTPC
SOURCES ${SRCS_DATATYPES})
target_compile_definitions(${targetName} PRIVATE GPUCA_O2_LIB GPUCA_TPC_GEOMETRY_O2 GPUCA_HAVE_O2HEADERS)
Expand Down
6 changes: 6 additions & 0 deletions GPU/GPUTracking/Definitions/GPUDefGPUParameters.h
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@
#define GPUCA_LB_GPUTPCCFNoiseSuppression 512
#define GPUCA_LB_GPUTPCCFDeconvolution 512
#define GPUCA_LB_GPUTPCCFClusterizer 448
#define GPUCA_LB_GPUTPCNNClusterizer 448
#define GPUCA_LB_COMPRESSION_GATHER 1024
#define GPUCA_NEIGHBOURS_FINDER_MAX_NNEIGHUP 5
#define GPUCA_TRACKLET_SELECTOR_HITS_REG_SIZE 20
Expand Down Expand Up @@ -147,6 +148,7 @@
#define GPUCA_LB_GPUTPCCFNoiseSuppression 512
#define GPUCA_LB_GPUTPCCFDeconvolution 512
#define GPUCA_LB_GPUTPCCFClusterizer 512
#define GPUCA_LB_GPUTPCNNClusterizer 512
#define GPUCA_LB_COMPRESSION_GATHER 1024
#define GPUCA_NEIGHBOURS_FINDER_MAX_NNEIGHUP 5
#define GPUCA_TRACKLET_SELECTOR_HITS_REG_SIZE 20
Expand Down Expand Up @@ -213,6 +215,7 @@
#define GPUCA_LB_GPUTPCCFNoiseSuppression 448
#define GPUCA_LB_GPUTPCCFDeconvolution 384
#define GPUCA_LB_GPUTPCCFClusterizer 448
#define GPUCA_LB_GPUTPCNNClusterizer 448
#define GPUCA_LB_COMPRESSION_GATHER 1024
#define GPUCA_NEIGHBOURS_FINDER_MAX_NNEIGHUP 4
#define GPUCA_TRACKLET_SELECTOR_HITS_REG_SIZE 20
Expand Down Expand Up @@ -489,6 +492,9 @@
#ifndef GPUCA_LB_GPUTPCCFClusterizer
#define GPUCA_LB_GPUTPCCFClusterizer 512
#endif
#ifndef GPUCA_LB_GPUTPCNNClusterizer
#define GPUCA_LB_GPUTPCNNClusterizer 512
#endif
#ifndef GPUCA_LB_GPUTrackingRefitKernel_mode0asGPU
#define GPUCA_LB_GPUTrackingRefitKernel_mode0asGPU 256
#endif
Expand Down
20 changes: 20 additions & 0 deletions GPU/GPUTracking/Definitions/GPUSettingsList.h
Original file line number Diff line number Diff line change
Expand Up @@ -302,6 +302,26 @@ AddOption(printSettings, bool, false, "", 0, "Print all settings when initializi
AddVariable(eventDisplay, GPUCA_NAMESPACE::gpu::GPUDisplayFrontendInterface*, nullptr)
AddSubConfig(GPUSettingsProcessingRTC, rtc)
AddSubConfig(GPUSettingsProcessingParam, param)
AddOption(applyNNclusterizer, int, 0, "", 0, "(bool, default = 0), if the neural network clusterizer should be used.")
AddOption(nnInferenceDevice, std::string, "CPU", "", 0, "(std::string) Specify inference device (cpu (default), rocm, cuda)")
AddOption(nnInferenceDeviceId, unsigned int, 0, "", 0, "(unsigned int) Specify inference device id")
AddOption(nnInferenceAllocateDevMem, int, 0, "", 0, "(bool, default = 0), if the device memory should be allocated for inference")
AddOption(nnInferenceDtype, std::string, "fp32", "", 0, "(std::string) Specify the datatype for which inference is performed (fp32: default, fp16)") // fp32 or fp16
AddOption(nnInferenceThreadsPerNN, int, 0, "", 0, "Number of threads used to evaluate one neural network")
AddOption(nnInferenceEnableOrtOptimization, unsigned int, 1, "", 0, "Enables graph optimizations in ONNX Runtime. Can be greater than 1!")
AddOption(nnInferenceOrtProfiling, int, 0, "", 0, "Enables profiling of model execution in ONNX Runtime")
AddOption(nnInferenceOrtProfilingPath, std::string, ".", "", 0, "If mmInferenceOrtProfiling is set, the path to store the profiling data")
AddOption(nnInferenceVerbosity, int, 1, "", 0, "0: No messages; 1: Warnings; 2: Warnings + major debugs; >3: All debugs")
AddOption(nnClusterizerAddIndexData, int, 1, "", 0, "If normalized index data (sector, row, pad), should be appended to the input")
AddOption(nnClusterizerSizeInputRow, int, 3, "", 0, "Size of the input to the NN (currently calcualted as (length-1)/2")
AddOption(nnClusterizerSizeInputPad, int, 3, "", 0, "Size of the input to the NN (currently calcualted as (length-1)/2")
AddOption(nnClusterizerSizeInputTime, int, 3, "", 0, "Size of the input to the NN (currently calcualted as (length-1)/2")
AddOption(nnClusterizerUseCFregression, int, 0, "", 0, "(bool, default = false) If true, use the regression from the native clusterizer and not the NN")
AddOption(nnClusterizerBatchedMode, unsigned int, 1, "", 0, "(int, default = 1) If >1, the NN is evaluated on batched input of size specified in this variable")
AddOption(nnClassificationPath, std::string, "network_class.onnx", "", 0, "The classification network path")
AddOption(nnClassThreshold, float, 0.5, "", 0, "The cutoff at which clusters will be accepted / rejected.")
AddOption(nnRegressionPath, std::string, "network_reg.onnx", "", 0, "The regression network path")
AddOption(nnSigmoidTrafoClassThreshold, int, 1, "", 0, "If true (default), then the classification threshold is transformed by an inverse sigmoid function. This depends on how the network was trained (with a sigmoid as acitvation function in the last layer or not).")
AddHelp("help", 'h')
EndConfig()
#endif // __OPENCL__
Expand Down
78 changes: 74 additions & 4 deletions GPU/GPUTracking/Global/GPUChainTrackingClusterizer.cxx
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@
/// \file GPUChainTrackingClusterizer.cxx
/// \author David Rohr

#include <CommonUtils/StringUtils.h>

#include "GPUChainTracking.h"
#include "GPUChainTrackingDefs.h"
#include "GPULogging.h"
Expand Down Expand Up @@ -849,8 +851,14 @@ int32_t GPUChainTracking::RunTPCClusterizer(bool synchronizeOutput)
if (clusterer.mPmemory->counters.nPeaks == 0) {
continue;
}
runKernel<GPUTPCCFNoiseSuppression, GPUTPCCFNoiseSuppression::noiseSuppression>({GetGrid(clusterer.mPmemory->counters.nPeaks, lane), {iSlice}});
runKernel<GPUTPCCFNoiseSuppression, GPUTPCCFNoiseSuppression::updatePeaks>({GetGrid(clusterer.mPmemory->counters.nPeaks, lane), {iSlice}});
if (!GetProcessingSettings().applyNNclusterizer) {
runKernel<GPUTPCCFNoiseSuppression, GPUTPCCFNoiseSuppression::noiseSuppression>({GetGrid(clusterer.mPmemory->counters.nPeaks, lane), {iSlice}});
runKernel<GPUTPCCFNoiseSuppression, GPUTPCCFNoiseSuppression::updatePeaks>({GetGrid(clusterer.mPmemory->counters.nPeaks, lane), {iSlice}});
} else {
// FIXME: This potentially needs to be removed when I actually apply the NN. For now its only to make the code work
runKernel<GPUTPCCFNoiseSuppression, GPUTPCCFNoiseSuppression::noiseSuppression>({GetGrid(clusterer.mPmemory->counters.nPeaks, lane), {iSlice}});
runKernel<GPUTPCCFNoiseSuppression, GPUTPCCFNoiseSuppression::updatePeaks>({GetGrid(clusterer.mPmemory->counters.nPeaks, lane), {iSlice}});
}
if (DoDebugAndDump(RecoStep::TPCClusterFinding, 262144 << 3, clusterer, &GPUTPCClusterFinder::DumpSuppressedPeaks, *mDebugFile)) {
clusterer.DumpPeakMap(*mDebugFile, "Suppressed Peaks");
}
Expand Down Expand Up @@ -884,14 +892,76 @@ int32_t GPUChainTracking::RunTPCClusterizer(bool synchronizeOutput)
runKernel<GPUTPCCFDeconvolution>({GetGrid(clusterer.mPmemory->counters.nPositions, lane), {iSlice}});
DoDebugAndDump(RecoStep::TPCClusterFinding, 262144 << 4, clusterer, &GPUTPCClusterFinder::DumpChargeMap, *mDebugFile, "Split Charges");

runKernel<GPUTPCCFClusterizer>({GetGrid(clusterer.mPmemory->counters.nClusters, lane), {iSlice}}, 0);
if (GetProcessingSettings().applyNNclusterizer) {
// Settings for the clusterizer
clusterer.nnClusterizerUseCFregression = GetProcessingSettings().nnClusterizerUseCFregression;
clusterer.nnClusterizerSizeInputRow = GetProcessingSettings().nnClusterizerSizeInputRow;
clusterer.nnClusterizerSizeInputPad = GetProcessingSettings().nnClusterizerSizeInputPad;
clusterer.nnClusterizerSizeInputTime = GetProcessingSettings().nnClusterizerSizeInputTime;
clusterer.nnClusterizerAddIndexData = GetProcessingSettings().nnClusterizerAddIndexData;
clusterer.nnClusterizerElementSize = ((2 * clusterer.nnClusterizerSizeInputRow + 1) * (2 * clusterer.nnClusterizerSizeInputPad + 1) * (2 * clusterer.nnClusterizerSizeInputTime + 1)) + (clusterer.nnClusterizerAddIndexData ? 3 : 0);
clusterer.nnClusterizerBatchedMode = GetProcessingSettings().nnClusterizerBatchedMode;
clusterer.nnClusterizerVerbosity = GetProcessingSettings().nnInferenceVerbosity;

// Settings for the NN evaluation
clusterer.nnClassThreshold = GetProcessingSettings().nnClassThreshold;
clusterer.nnSigmoidTrafoClassThreshold = GetProcessingSettings().nnSigmoidTrafoClassThreshold;

// Settings for the neural network evaluation
clusterer.OrtOptions = {
{"model-path", GetProcessingSettings().nnClassificationPath},
{"device", GetProcessingSettings().nnInferenceDevice},
{"device-id", std::to_string(GetProcessingSettings().nnInferenceDeviceId)},
{"allocate-device-memory", std::to_string(GetProcessingSettings().nnInferenceAllocateDevMem)},
{"dtype", GetProcessingSettings().nnInferenceDtype},
{"intra-op-num-threads", std::to_string(GetProcessingSettings().nnInferenceThreadsPerNN)},
{"enable-optimizations", std::to_string(GetProcessingSettings().nnInferenceEnableOrtOptimization)},
{"enable-profiling", std::to_string(GetProcessingSettings().nnInferenceOrtProfiling)},
{"profiling-output-path", GetProcessingSettings().nnInferenceOrtProfilingPath},
{"logging-level", std::to_string(GetProcessingSettings().nnInferenceVerbosity)}};
clusterer.model_class.init(clusterer.OrtOptions);
if (!clusterer.nnClusterizerUseCFregression) {
std::vector<std::string> reg_model_paths = o2::utils::Str::tokenize(GetProcessingSettings().nnRegressionPath, ':');
if (clusterer.model_class.getNumOutputNodes()[0][1] == 1) {
clusterer.OrtOptions["model-path"] = reg_model_paths[0];
clusterer.model_reg_1.init(clusterer.OrtOptions);
} else {
if (reg_model_paths.size() == 1) {
clusterer.OrtOptions["model-path"] = reg_model_paths[0];
clusterer.model_reg_1.init(clusterer.OrtOptions);
} else {
clusterer.OrtOptions["model-path"] = reg_model_paths[0];
clusterer.model_reg_1.init(clusterer.OrtOptions);
clusterer.OrtOptions["model-path"] = reg_model_paths[1];
clusterer.model_reg_2.init(clusterer.OrtOptions);
}
}
} else {
runKernel<GPUTPCCFDeconvolution>({GetGrid(clusterer.mPmemory->counters.nPositions, lane), {iSlice}});
DoDebugAndDump(RecoStep::TPCClusterFinding, 262144 << 4, clusterer, &GPUTPCClusterFinder::DumpChargeMap, *mDebugFile, "Split Charges");
}

if (clusterer.nnSigmoidTrafoClassThreshold) {
// Inverse sigmoid transformation
clusterer.nnClassThreshold = (float)std::log(clusterer.nnClassThreshold / (1.f - clusterer.nnClassThreshold));
}
runKernel<GPUTPCNNClusterizer>({GetGrid(std::ceil(clusterer.mPmemory->counters.nClusters / (float)clusterer.nnClusterizerBatchedMode), lane, GPUReconstruction::krnlDeviceType::CPU), {iSlice}}, 0);
} else {
runKernel<GPUTPCCFClusterizer>({GetGrid(clusterer.mPmemory->counters.nClusters, lane, GPUReconstruction::krnlDeviceType::CPU), {iSlice}}, 0);
}

if (doGPU && propagateMCLabels) {
TransferMemoryResourceLinkToHost(RecoStep::TPCClusterFinding, clusterer.mScratchId, lane);
if (doGPU) {
SynchronizeStream(lane);
}
runKernel<GPUTPCCFClusterizer>({GetGrid(clusterer.mPmemory->counters.nClusters, lane, GPUReconstruction::krnlDeviceType::CPU), {iSlice}}, 1);
if (!GetProcessingSettings().applyNNclusterizer) {
runKernel<GPUTPCCFClusterizer>({GetGrid(clusterer.mPmemory->counters.nClusters, lane, GPUReconstruction::krnlDeviceType::CPU), {iSlice}}, 1);
} else {
runKernel<GPUTPCNNClusterizer>({GetGrid(std::ceil(clusterer.mPmemory->counters.nClusters / (float)clusterer.nnClusterizerBatchedMode), lane, GPUReconstruction::krnlDeviceType::CPU), {iSlice}}, 1);
}
}

if (GetProcessingSettings().debugLevel >= 3) {
GPUInfo("Sector %02d Fragment %02d Lane %d: Found clusters: digits %u peaks %u clusters %u", iSlice, fragment.index, lane, (int32_t)clusterer.mPmemory->counters.nPositions, (int32_t)clusterer.mPmemory->counters.nPeaks, (int32_t)clusterer.mPmemory->counters.nClusters);
}
Expand Down
1 change: 1 addition & 0 deletions GPU/GPUTracking/TPCClusterFinder/ChargePos.h
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ struct ChargePos {
GPUdi() tpccf::Row row() const { return gpad / TPC_PADS_PER_ROW_PADDED; }
GPUdi() tpccf::Pad pad() const { return gpad % TPC_PADS_PER_ROW_PADDED - GPUCF_PADDING_PAD; }
GPUdi() tpccf::TPCFragmentTime time() const { return timePadded - GPUCF_PADDING_TIME; }
GPUdi() tpccf::TPCFragmentTime globalTime() const { return timePadded; }

private:
// Maps the position of a pad given as row and index in that row to a unique
Expand Down
18 changes: 18 additions & 0 deletions GPU/GPUTracking/TPCClusterFinder/ClusterAccumulator.h
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,24 @@ class ClusterAccumulator
GPUd() void finalize(const ChargePos&, tpccf::Charge, tpccf::TPCTime, const GPUTPCGeometry&);
GPUd() bool toNative(const ChargePos&, tpccf::Charge, tpc::ClusterNative&, const GPUParam&) const;

GPUd() void setFull(float qtot, float padMean, float padSigma, float timeMean, float timeSigma, uint8_t splitInTime, uint8_t splitInPad)
{
mQtot = qtot;
mPadMean = padMean;
mPadSigma = padSigma;
mTimeMean = timeMean;
mTimeSigma = timeSigma;
mSplitInTime = splitInTime;
mSplitInPad = splitInPad;
}
GPUd() void setQtot(float qtot) { mQtot = qtot; }
GPUd() void setPadMean(float padMean) { mPadMean = padMean; }
GPUd() void setPadSigma(float padSigma) { mPadSigma = padSigma; }
GPUd() void setTimeMean(float timeMean) { mTimeMean = timeMean; }
GPUd() void setTimeSigma(float timeSigma) { mTimeSigma = timeSigma; }
GPUd() void setSplitInTime(uint8_t splitInTime) { mSplitInTime = splitInTime; }
GPUd() void setSplitInPad(uint8_t splitInPad) { mSplitInPad = splitInPad; }

private:
float mQtot = 0;
float mPadMean = 0;
Expand Down
18 changes: 18 additions & 0 deletions GPU/GPUTracking/TPCClusterFinder/GPUTPCClusterFinder.h
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,10 @@
#include "GPUProcessor.h"
#include "GPUDataTypes.h"
#include "CfFragment.h"
#include "ML/OrtInterface.h"
#include "ML/3rdparty/GPUORTFloat16.h"

using namespace o2::ml;

namespace o2
{
Expand Down Expand Up @@ -141,6 +145,20 @@ class GPUTPCClusterFinder : public GPUProcessor
int16_t mZSOffsetId = -1;
int16_t mOutputId = -1;

int nnClusterizerSizeInputRow = 3;
int nnClusterizerSizeInputPad = 3;
int nnClusterizerSizeInputTime = 3;
int nnClusterizerElementSize = -1;
bool nnClusterizerAddIndexData = true;
float nnClassThreshold = 0.16;
bool nnSigmoidTrafoClassThreshold = 1;
int nnClusterizerUseCFregression = 0;
int nnClusterizerBatchedMode = 1;
int nnClusterizerVerbosity = 0;

std::unordered_map<std::string, std::string> OrtOptions;
OrtModel model_class, model_reg_1, model_reg_2; // For splitting clusters

#ifndef GPUCA_GPUCODE
void DumpDigits(std::ostream& out);
void DumpChargeMap(std::ostream& out, std::string_view);
Expand Down
Loading
Loading