-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RooFit EvalBackend("cpu") Disables Multi-Core Support During RooMinimizer Minimization #17344
Comments
Thank you very much for opening this issue! It reminds me that the documentation needs to be updated. The scaling of the old RooFit multi-core support was not very good, and no matter how many cores you used in Your reproducer script confirms this once again. Here are the performance numbers I get with it on my machine: Using multiple processes for the legacy backend converges to about the same runtime as the new CPU backend for your reproducer on my machine. Therefore, implementing multi-core support for the new backend was not strictly necessary: there are no performance regressions. Also, parallelizing likelihood evaluations in the context of numeric minimization is difficult to do in a way that scales well in the general case. In some cases, it's better to parallelize over events, in others it's better to parallelize over likelihood components. Doing the wrong thing often results in even longer fitting times because of scheduling overhead. Hence, we didn't implement multi-core support for the new backend. Instead, users are encouraged to parallelize their workflows at a higher level, like at the level of doing many fits at the same time (e.g. for toy studies or profile likelihood scans). I will update the documentation to make this clear. That being said: you are free to then make a feature request where you ask for multi-processing with the new default backend! But I'm afraid the priority won't be that high. People were "fine" with the performance of the old backend, and the new backend generally beats it's performance for an arbitrary number of cores used in the legacy backend. So getting out more performance is not in the focus right now. However, the situation would of course change if you have a realistic usecase where using the new backend with one thread would constitute a significant performance regression over using the old backend with a realistic number of threads, and there is no other parallelization possible at the user level! Let me know if this reasoning makes sense to you, and already thank you very much for your further feedback! |
Hi Jonas, Thank you very much for your prompt and detailed response. I appreciate you taking the time to explain the reasons behind the this! I wanted to share some results from my own testing: In my C++ project, which involves an angular analysis with weighted events, I handle approximately 800k inputs. The fitting process performs a simultaneous fit across 8 subsamples with a total of 32 parameters across 3 observables (3D fit). Here are the performance metrics I observed: These results indicate that, in this specific use case, the legacy backend with multi-core support outperforms the new cpu backend by approximately 2 minutes (~20%). I understand the difficulties in parallelising the likelihood calculation in the new backend. Given these results, I am comfortable using the new cpu backend as it reduces resource usage (from 28 cores to 1 core) with only a modest increase in computation time. However, it would be highly beneficial to have a more intelligent mechanism to automatically select the optimal backend based on the specific analysis case. This could potentially maximise performance while minimising resource consumption without requiring manual configuration ;) Thank you once again for your attention to this matter and for your contributions to the project! |
Hi @JieWu-GitHub, thanks for also reporting your measurements! A slowdown by 2 minutes (20 %) is significant. Let's maybe keep this issue open for now to see if there is something that can be done easily. I will do some more checks. At least the issue should not be closed before the documentation is updated. One more question about your usecase: are the 800k sample distributed uniformly over the subsamples in the simultaneous fit, e.g. each of the 8 subsamples has 100k entries? |
Hi @guitargeek , thank you for looking into this and for your thoughtful response. Regarding your question, the samples are not evenly distributed across the subsamples. The distribution fractions are approximately as follows: This actually involves a simultaneous fit across 4 mass bins and 2 charge states, resulting in a total of 8 bins. |
Check duplicate issues.
Description
When minimizing the log-likelihood created with createNLL in RooFit,
specifying (the default backend)
nll1 = pdf1.createNLL(data1, RooFit.NumCPU(4), RooFit.EvalBackend("cpu"))
nll2 = pdf2.createNLL(data2, RooFit.NumCPU(4), RooFit.EvalBackend("cpu"))
unexpectedly disables multi-core usage.
To leverage multiple cores, the backend must be set to
nll1 = pdf1.createNLL(data1, RooFit.NumCPU(4), RooFit.EvalBackend("legacy"))
nll2 = pdf2.createNLL(data2, RooFit.NumCPU(4), RooFit.EvalBackend("legacy"))
Reproducer
ROOT version
| Welcome to ROOT 6.32.02 https://root.cern |
| (c) 1995-2024, The ROOT Team; conception: R. Brun, F. Rademakers |
| Built for linuxx8664gcc on Sep 18 2024, 20:01:03 |
| From heads/master@tags/v6-32-02 |
| With |
| Try '.help'/'.?', '.demo', '.license', '.credits', '.quit'/'.q' |
Installation method
conda
Operating system
openSUSE Leap 15.6
Additional context
No response
The text was updated successfully, but these errors were encountered: