-
Notifications
You must be signed in to change notification settings - Fork 238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding basic parameter sweep tool #1284
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #1284 +/- ##
==========================================
+ Coverage 77.51% 77.58% +0.07%
==========================================
Files 390 391 +1
Lines 63884 64288 +404
Branches 11756 11815 +59
==========================================
+ Hits 49517 49878 +361
- Misses 11795 11829 +34
- Partials 2572 2581 +9 ☔ View full report in Codecov by Sentry. |
""" | ||
Returns OrderedDict containing the results from the parameter sweep. | ||
""" | ||
return self._results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a DataFrame for consistency with ParameterSweepSpecification
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure on that - I don't see any reason why it could not be, but I am also not sure that it needs to be either. The reason the ParameterSweepSpecification
uses a DataFrame
is because Pysmo returns a numpy.array
(which I then turned into a DataFrame
so that the input names were explicitly associated with the data to avoid future issues).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to combine the samples and results into one dataframe, we can do:
samples.join(pd.DataFrame(runner.results).transpose())
My current approach to generating serialized data is something like:
reslist = samples.to_dict(orient="records")
for i, res in enumerate(reslist):
# Add important results to the dict containing the sampled inputs
# runner.results[i]["results"] is a named tuple I use to hold results
res["solved"] = runner.results[i]["solved"]
res["feasible"] = runner.results[i]["results"].feasible
res["objective"] = runner.results[i]["results"].objective
res["solve_time"] = runner.results[i]["results"].timer.timers["solve"].total_time
res["error"] = results[i]["error"]
with open(frame, "w") as f:
json.dump(reslist, f)
No objection to the current data structure for results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I think I want to leave things as they are. For one, I think it is best to keep the samples separate from the results (we've tried different things at different time and for now I prefer this).
idaes/core/util/parameter_sweep.py
Outdated
args = self.config.build_model_arguments | ||
if args is None: | ||
args = {} | ||
|
||
model = self.config.build_model(**args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be an option to send sampled parameter values to the build_model
function? I could imagine this being useful if we want to set parameter values before initializing the model. (Or maybe this should be done by implementing a custom run_model
function that calls model.initialize()
before solving?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was part of my intention for the run_model
method; build_model
would do everything up to setting values, and then run_model
would do anything that needed to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that makes sense
@Robbybp Would you have time to take a quick look at this again? I am not sure there is much more we can do until the WaterTAP parameter sweep tool is separated from WaterTAP, so I think it might be best ot get the general API we want in place and then merge this. |
@k1nshuk Would you have some time to take a look at this and see what you think. A lot of this will need to wait until the WaterTAP parameter sweep tool is in its own repo, but I would like to get the IDAES API ready so that we can start using it in our tests (with a basic sequential runner in the background). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
@andrewlee94 I will try to re-review by the end of the week. |
FYI, I have not forgotten about reviewing this. I am working through an application using this, and will add my review once I'm done. I don't anticipate major change requests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main comment is that I don't think that build_outputs
should be required, then a few other small comments below.
idaes/core/util/parameter_sweep.py
Outdated
if self.config.build_outputs is None: | ||
raise ConfigurationError( | ||
"Please provide a method to collect results from sample run." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason we require a build_outputs
method? It seems like a reasonable default could be to just return run_stats
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we could just default to run_stats
- this is probably not what most users want, but they should be providing a build_outputs
method anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually thought the results object would be what the user typically wants. And with this default, if they want some other data structure, they can just return it from run_model. Now that I think about it, I'm not sure that build_outputs is necessary. Is there a reason we don't just expect users to process output in a custom run_model method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is partly compatibility with WaterTAP and partly to keep things separate.
""" | ||
Returns OrderedDict containing the results from the parameter sweep. | ||
""" | ||
return self._results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to combine the samples and results into one dataframe, we can do:
samples.join(pd.DataFrame(runner.results).transpose())
My current approach to generating serialized data is something like:
reslist = samples.to_dict(orient="records")
for i, res in enumerate(reslist):
# Add important results to the dict containing the sampled inputs
# runner.results[i]["results"] is a named tuple I use to hold results
res["solved"] = runner.results[i]["solved"]
res["feasible"] = runner.results[i]["results"].feasible
res["objective"] = runner.results[i]["results"].objective
res["solve_time"] = runner.results[i]["results"].timer.timers["solve"].total_time
res["error"] = results[i]["error"]
with open(frame, "w") as f:
json.dump(reslist, f)
No objection to the current data structure for results.
idaes/core/util/model_diagnostics.py
Outdated
) | ||
|
||
|
||
class ConvergenceAnalysis: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't used this functionality, and haven't seen an example of it, so I can't offer much of a review of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fairly specific use case for robustness checking. It is basically a sweep of predefined samples to collect stats from IPOPT (solver status, iterations, etc.) and comparing them to a baseline to ensure no drift in solver behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a port of the convergence tester that Carl wrote? Maybe the name should indicate that this functionality is Ipopt-specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is an update of Carl's convergence tester. I am not sure if the name needs to be changed; this is primarily an internal tool for testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the class is not marked as private via _
, I'd prefer the name to indicate that the functionality is Ipopt-specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Co-authored-by: MarcusHolly <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with this, although I'd prefer the name of ConvergenceAnalysis
to indicate that it is something Ipopt-specific. Your choice whether this is worth it.
idaes/core/util/model_diagnostics.py
Outdated
) | ||
|
||
|
||
class ConvergenceAnalysis: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the class is not marked as private via _
, I'd prefer the name to indicate that the functionality is Ipopt-specific.
@Robbybp Any suggestion for a good name? |
@andrewlee94 I would go with |
Part of diagnostics work
Summary/Motivation:
Diagnostic checks for numerical issues need to be run across a range of input values to ensure that the model is well posed across the full range of operation. The existing ConvergenceTester tool provides much of this functionality, however it it is set up as a stand-alone tool targeted at a very specific purpose.
This PR aims to generalize a lot of the core capabilities and to hopefully start aligning this with similar capabilities being developed in WaterTAP. The end goal for this is to define a standard API for setting up parameter sweep type runs in IDAES, whilst allowing users to implement workflow managers using a parallelization tool of their choice.
Changes proposed in this PR:
ParameterSweepBase
call with core functionality and API for running parameter sweepsSequentialSweepRunner
class which derived fromParameterSweepBase
to implement a simple sequential workflow manager.Legal Acknowledgement
By contributing to this software project, I agree to the following terms and conditions for my contribution: