Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add pandas_diff for fast-carpenter output #4

Open
kreczko opened this issue May 21, 2019 · 6 comments
Open

add pandas_diff for fast-carpenter output #4

kreczko opened this issue May 21, 2019 · 6 comments
Labels
originally gitlab For items that were originally created on gitlab and imported over
Milestone

Comments

@kreczko
Copy link
Contributor

kreczko commented May 21, 2019

Imported from gitlab issue 4

@bkrikler Could you please send me some example output files?

@kreczko kreczko added this to the Version 0.3.0 milestone May 21, 2019
@kreczko kreczko added the originally gitlab For items that were originally created on gitlab and imported over label May 21, 2019
@kreczko
Copy link
Contributor Author

kreczko commented May 21, 2019

On 2019-03-04 Lukasz Kreczko (kreczko) wrote:

changed the description

@kreczko
Copy link
Contributor Author

kreczko commented May 21, 2019

On 2019-03-04 Benjamin Krikler (bkrikler) wrote:

Thanks for putting this on an issue. Best bet for example outputs is the fast_cms_public_tutorial repository's pipeline, eg.: https://gitlab.cern.ch/fast-hep/public/fast_cms_public_tutorial/-/jobs/3492813/artifacts/browse/pipeline/carpenter/ (I've "kept" the job artifacts for that specific pipeline now).

@kreczko
Copy link
Contributor Author

kreczko commented May 21, 2019

On 2019-03-04 Benjamin Krikler (bkrikler) wrote:

Also, I had a primitive set of pandas_diff-like tests running in the old FAST-RA1 project, which might help with this: https://gitlab.cern.ch/fast-cms/FAST-RA1/blob/master/tests/integrations/run_tests.py#L83-131. The tests there only checked for exact equality between two reloaded dataframes, but it might help provide a starting point for this. Although the rest of the code is pretty simple, so maybe it's not really adding anything for you...

@kreczko
Copy link
Contributor Author

kreczko commented May 21, 2019

On 2019-03-04 Lukasz Kreczko (kreczko) wrote:

Thanks for the examples, this will be useful.

I am trying to get the diff into a similar shape to the ROOT version:

  • calculate KS & p-value for all 1D projections
  • display differing projections

For the current CSV files that's essentially identifying the category, variables & statistical data.
I will have to think how to do this in a general way (like for ROOT) without being to verbose with the settings (e.g. `pandas_diff -c dataset, --var nMuon, nIsoMuons, -n n).

Maybe worth looking at fast-plotter for this?

@kreczko
Copy link
Contributor Author

kreczko commented May 21, 2019

On 2019-03-04 Benjamin Krikler (bkrikler) wrote:

Yes, fast-plotter could be quite helpful for this. It depends a bit how generic / specific you want to be, however, i.e. is this a pandas-diff function, or a "fast binned dataframe"-diff? I think if it's the former it could be tricky to do this in some meaningful but general way, at least if the pandas dataframes are stored as CSV files (as binary files, you'd lose less info, like which columns are actually in the index). If you're comfortable being more specific to fast-carpenter's outputs then fast-plotter could be quite helpful, since it wraps reloading the CSV files, and gives utilities to project and sum, plus potentially plot the resulting differences.

@kreczko
Copy link
Contributor Author

kreczko commented May 21, 2019

On 2019-03-04 Lukasz Kreczko (kreczko) wrote:

Yes, I am thinking more fast_binned_df_diff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
originally gitlab For items that were originally created on gitlab and imported over
Projects
None yet
Development

No branches or pull requests

1 participant