-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add pandas_diff for fast-carpenter output #4
Comments
On 2019-03-04 Lukasz Kreczko (kreczko) wrote: changed the description |
On 2019-03-04 Benjamin Krikler (bkrikler) wrote: Thanks for putting this on an issue. Best bet for example outputs is the fast_cms_public_tutorial repository's pipeline, eg.: https://gitlab.cern.ch/fast-hep/public/fast_cms_public_tutorial/-/jobs/3492813/artifacts/browse/pipeline/carpenter/ (I've "kept" the job artifacts for that specific pipeline now). |
On 2019-03-04 Benjamin Krikler (bkrikler) wrote: Also, I had a primitive set of pandas_diff-like tests running in the old FAST-RA1 project, which might help with this: https://gitlab.cern.ch/fast-cms/FAST-RA1/blob/master/tests/integrations/run_tests.py#L83-131. The tests there only checked for exact equality between two reloaded dataframes, but it might help provide a starting point for this. Although the rest of the code is pretty simple, so maybe it's not really adding anything for you... |
On 2019-03-04 Lukasz Kreczko (kreczko) wrote: Thanks for the examples, this will be useful. I am trying to get the diff into a similar shape to the ROOT version:
For the current CSV files that's essentially identifying the category, variables & statistical data. Maybe worth looking at fast-plotter for this? |
On 2019-03-04 Benjamin Krikler (bkrikler) wrote: Yes, fast-plotter could be quite helpful for this. It depends a bit how generic / specific you want to be, however, i.e. is this a pandas-diff function, or a "fast binned dataframe"-diff? I think if it's the former it could be tricky to do this in some meaningful but general way, at least if the pandas dataframes are stored as CSV files (as binary files, you'd lose less info, like which columns are actually in the index). If you're comfortable being more specific to fast-carpenter's outputs then fast-plotter could be quite helpful, since it wraps reloading the CSV files, and gives utilities to project and sum, plus potentially plot the resulting differences. |
On 2019-03-04 Lukasz Kreczko (kreczko) wrote: Yes, I am thinking more |
Imported from gitlab issue 4
@bkrikler Could you please send me some example output files?
The text was updated successfully, but these errors were encountered: