Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to define merged channels in PostFitShapesFromWorkspace (+ a few random fixes) #295

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

ajgilbert
Copy link
Collaborator

Adds option --merged-channels to PostFitShapesFromWorkspace that lets the user define additional channel labels that will be calculated as the sum over all the channels matching a particular regex pattern. E.g., given a card with channels: chn_ee_2016, chn_ee_2017, chn_ee_2018, chn_mm_2016, chn_mm_2017, chn_mm_2018, we could add:
--merged-channels 'chn_ee_tot=chn_ee_201.' 'chn_mm_tot=chn_mm_201.', and these additional merged channels will appear as directories in the ROOT file.

Partially addresses #291.

This PR also fixes a few assorted bugs, mostly cases where we need to avoid divide by zero.

Copy link
Collaborator

@kcormi kcormi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks good.

I was not familiar with LimitCompare.cc, but I gues it was unused and being removed for keeping things clean?

@@ -1191,6 +1192,9 @@ def FixTopRange(pad, fix_y, fraction):
if ymin == 0.:
print('Cannot adjust log-scale y-axis range if the minimum is zero!')
return
if fix_y <= 0:
print('Cannot adjust log-scale y-axis range if the maximum is zero!')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maybe put "less than or equal to zero" here in the printout, just since that's what the check is doing?

@anigamova anigamova linked an issue Mar 10, 2023 that may be closed by this pull request
4 tasks
@anigamova
Copy link
Collaborator

For the merging would it make sense to check if the channel that a user wants to merge have the same binning and if it is not the case print out a warning message? I think TH1.Add() adds histograms by bin number and completely disregards the differences in bin labels

@ajgilbert
Copy link
Collaborator Author

@anigamova I think it's a good idea. It's true it probably does something undesirable if this is the case

@ajgilbert
Copy link
Collaborator Author

Have been rethinking this one a bit, and #267.

We might be able to make a bigger improvement, that will also help performance, by also doing some restructuring. One of the issues often reported is that PostFitShapesFromWorkspace is quite slow for more complex models. This is partly due to the fact that each call to GetUncertainty or GetShapeWithUncertainty builds its own loop over the covariance matrix resamples, which adds a certain amount of overhead.

A better way could be to add a new function that returns a list of shapes/yields according to some input specification. By default, this could be a list of each category, within which a list of processes, as well as the usual signal/background/total sums. But we could allow custom specs, e.g. defining a new category based on a list of actual categories, or a regex to match them - similarly for sums of processes.

Then all the shapes/yields the user has requested can be computed with just one loop. We might also build into this a mechanism to optionally return the full set of variations.

@pkausw
Copy link

pkausw commented Mar 28, 2023

Have been rethinking this one a bit, and #267.

We might be able to make a bigger improvement, that will also help performance, by also doing some restructuring. One of the issues often reported is that PostFitShapesFromWorkspace is quite slow for more complex models. This is partly due to the fact that each call to GetUncertainty or GetShapeWithUncertainty builds its own loop over the covariance matrix resamples, which adds a certain amount of overhead.

A better way could be to add a new function that returns a list of shapes/yields according to some input specification. By default, this could be a list of each category, within which a list of processes, as well as the usual signal/background/total sums. But we could allow custom specs, e.g. defining a new category based on a list of actual categories, or a regex to match them - similarly for sums of processes.

Then all the shapes/yields the user has requested can be computed with just one loop. We might also build into this a mechanism to optionally return the full set of variations.

Hi @ajgilbert ,

Yes, this seems like a good way to increase performance! Since the changes you propose are mostly relevant for the backend of the harvester, it might make sense to still go ahead with this PR and #267 in my personal opinion. This way, we could already use the improvements to PostfitshapesFromWorkspace from these PRs and maybe disentangle them from the following changes in the backend. But I also don't have a very strong opinion about this. Would you prefer to do it all in one go?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PostFitShapesFromWorkspace development
4 participants