Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding basic parameter sweep tool #1284

Merged
merged 33 commits into from
Feb 22, 2024
Merged

Conversation

andrewlee94
Copy link
Member

Part of diagnostics work

Summary/Motivation:

Diagnostic checks for numerical issues need to be run across a range of input values to ensure that the model is well posed across the full range of operation. The existing ConvergenceTester tool provides much of this functionality, however it it is set up as a stand-alone tool targeted at a very specific purpose.

This PR aims to generalize a lot of the core capabilities and to hopefully start aligning this with similar capabilities being developed in WaterTAP. The end goal for this is to define a standard API for setting up parameter sweep type runs in IDAES, whilst allowing users to implement workflow managers using a parallelization tool of their choice.

Changes proposed in this PR:

  • Move core functionality of ConvergenceTester to a new parameter sweep utility. The old ConvergenceTester will be retained for now for backward compatibility.
  • Update input specification class to leverage Pysmo's sampling tools
  • Create a ParameterSweepBase call with core functionality and API for running parameter sweeps
  • Create a derived SequentialSweepRunner class which derived from ParameterSweepBase to implement a simple sequential workflow manager.

Legal Acknowledgement

By contributing to this software project, I agree to the following terms and conditions for my contribution:

  1. I agree my contributions are submitted under the license terms described in the LICENSE.txt file at the top level of this directory.
  2. I represent I am authorized to make the contributions and grant the license. If my employer has rights to intellectual property that includes these contributions, I represent that I have received permission to make contributions and grant the required license on behalf of that employer.

@andrewlee94 andrewlee94 self-assigned this Nov 7, 2023
@andrewlee94 andrewlee94 changed the title Adding basic parameter sweep tool [WIP] Adding basic parameter sweep tool Nov 7, 2023
@andrewlee94 andrewlee94 added enhancement New feature or request help wanted Extra attention is needed Priority:Normal Normal Priority Issue or PR core Issues dealing with core modeling components testing Issues dealing with testing of code WaterTAP WIP PySMO diagnostics labels Nov 7, 2023
Copy link

codecov bot commented Nov 7, 2023

Codecov Report

Attention: 40 lines in your changes are missing coverage. Please review.

Comparison is base (94b98f7) 77.51% compared to head (f697d85) 77.58%.

Files Patch % Lines
idaes/core/util/model_diagnostics.py 84.17% 18 Missing and 4 partials ⚠️
idaes/core/util/parameter_sweep.py 92.99% 12 Missing and 6 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1284      +/-   ##
==========================================
+ Coverage   77.51%   77.58%   +0.07%     
==========================================
  Files         390      391       +1     
  Lines       63884    64288     +404     
  Branches    11756    11815      +59     
==========================================
+ Hits        49517    49878     +361     
- Misses      11795    11829      +34     
- Partials     2572     2581       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

"""
Returns OrderedDict containing the results from the parameter sweep.
"""
return self._results
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a DataFrame for consistency with ParameterSweepSpecification?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure on that - I don't see any reason why it could not be, but I am also not sure that it needs to be either. The reason the ParameterSweepSpecification uses a DataFrame is because Pysmo returns a numpy.array (which I then turned into a DataFrame so that the input names were explicitly associated with the data to avoid future issues).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to combine the samples and results into one dataframe, we can do:

samples.join(pd.DataFrame(runner.results).transpose())

My current approach to generating serialized data is something like:

reslist = samples.to_dict(orient="records")
for i, res in enumerate(reslist):
    # Add important results to the dict containing the sampled inputs
    # runner.results[i]["results"] is a named tuple I use to hold results
    res["solved"] = runner.results[i]["solved"]                                                          
    res["feasible"] = runner.results[i]["results"].feasible                                              
    res["objective"] = runner.results[i]["results"].objective                                            
    res["solve_time"] = runner.results[i]["results"].timer.timers["solve"].total_time
    res["error"] = results[i]["error"]

with open(frame, "w") as f:
    json.dump(reslist, f)

No objection to the current data structure for results.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I think I want to leave things as they are. For one, I think it is best to keep the samples separate from the results (we've tried different things at different time and for now I prefer this).

idaes/core/util/parameter_sweep.py Outdated Show resolved Hide resolved
Comment on lines 467 to 471
args = self.config.build_model_arguments
if args is None:
args = {}

model = self.config.build_model(**args)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be an option to send sampled parameter values to the build_model function? I could imagine this being useful if we want to set parameter values before initializing the model. (Or maybe this should be done by implementing a custom run_model function that calls model.initialize() before solving?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was part of my intention for the run_model method; build_model would do everything up to setting values, and then run_model would do anything that needed to follow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that makes sense

@andrewlee94 andrewlee94 mentioned this pull request Nov 16, 2023
31 tasks
@andrewlee94 andrewlee94 marked this pull request as ready for review February 6, 2024 16:02
@andrewlee94 andrewlee94 requested a review from Robbybp February 6, 2024 16:03
@andrewlee94
Copy link
Member Author

@Robbybp Would you have time to take a quick look at this again? I am not sure there is much more we can do until the WaterTAP parameter sweep tool is separated from WaterTAP, so I think it might be best ot get the general API we want in place and then merge this.

@andrewlee94
Copy link
Member Author

@k1nshuk Would you have some time to take a look at this and see what you think. A lot of this will need to wait until the WaterTAP parameter sweep tool is in its own repo, but I would like to get the IDAES API ready so that we can start using it in our tests (with a basic sequential runner in the background).

Copy link

@k1nshuk k1nshuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@Robbybp
Copy link
Member

Robbybp commented Feb 7, 2024

@andrewlee94 I will try to re-review by the end of the week.

@andrewlee94 andrewlee94 changed the title [WIP] Adding basic parameter sweep tool Adding basic parameter sweep tool Feb 8, 2024
@Robbybp
Copy link
Member

Robbybp commented Feb 15, 2024

FYI, I have not forgotten about reviewing this. I am working through an application using this, and will add my review once I'm done. I don't anticipate major change requests.

Copy link
Member

@Robbybp Robbybp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main comment is that I don't think that build_outputs should be required, then a few other small comments below.

Comment on lines 672 to 675
if self.config.build_outputs is None:
raise ConfigurationError(
"Please provide a method to collect results from sample run."
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason we require a build_outputs method? It seems like a reasonable default could be to just return run_stats

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we could just default to run_stats - this is probably not what most users want, but they should be providing a build_outputs method anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually thought the results object would be what the user typically wants. And with this default, if they want some other data structure, they can just return it from run_model. Now that I think about it, I'm not sure that build_outputs is necessary. Is there a reason we don't just expect users to process output in a custom run_model method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is partly compatibility with WaterTAP and partly to keep things separate.

idaes/core/util/parameter_sweep.py Outdated Show resolved Hide resolved
idaes/core/util/parameter_sweep.py Outdated Show resolved Hide resolved
idaes/core/util/parameter_sweep.py Outdated Show resolved Hide resolved
"""
Returns OrderedDict containing the results from the parameter sweep.
"""
return self._results
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to combine the samples and results into one dataframe, we can do:

samples.join(pd.DataFrame(runner.results).transpose())

My current approach to generating serialized data is something like:

reslist = samples.to_dict(orient="records")
for i, res in enumerate(reslist):
    # Add important results to the dict containing the sampled inputs
    # runner.results[i]["results"] is a named tuple I use to hold results
    res["solved"] = runner.results[i]["solved"]                                                          
    res["feasible"] = runner.results[i]["results"].feasible                                              
    res["objective"] = runner.results[i]["results"].objective                                            
    res["solve_time"] = runner.results[i]["results"].timer.timers["solve"].total_time
    res["error"] = results[i]["error"]

with open(frame, "w") as f:
    json.dump(reslist, f)

No objection to the current data structure for results.

)


class ConvergenceAnalysis:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't used this functionality, and haven't seen an example of it, so I can't offer much of a review of it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fairly specific use case for robustness checking. It is basically a sweep of predefined samples to collect stats from IPOPT (solver status, iterations, etc.) and comparing them to a baseline to ensure no drift in solver behaviour.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a port of the convergence tester that Carl wrote? Maybe the name should indicate that this functionality is Ipopt-specific?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is an update of Carl's convergence tester. I am not sure if the name needs to be changed; this is primarily an internal tool for testing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the class is not marked as private via _, I'd prefer the name to indicate that the functionality is Ipopt-specific.

Copy link
Contributor

@MarcusHolly MarcusHolly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@Robbybp Robbybp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with this, although I'd prefer the name of ConvergenceAnalysis to indicate that it is something Ipopt-specific. Your choice whether this is worth it.

)


class ConvergenceAnalysis:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the class is not marked as private via _, I'd prefer the name to indicate that the functionality is Ipopt-specific.

@andrewlee94
Copy link
Member Author

@Robbybp Any suggestion for a good name? IPOPTConvergenceAnalysis seems cumbersome to me.

@Robbybp
Copy link
Member

Robbybp commented Feb 21, 2024

@andrewlee94 I would go with IpoptConvergenceAnalyzer, IpoptSolveAnalyzer, or just IpoptAnalyzer

@andrewlee94 andrewlee94 enabled auto-merge (squash) February 22, 2024 14:57
@andrewlee94 andrewlee94 merged commit 777b813 into IDAES:main Feb 22, 2024
39 checks passed
@andrewlee94 andrewlee94 deleted the parameter_sweep branch February 22, 2024 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Issues dealing with core modeling components diagnostics enhancement New feature or request help wanted Extra attention is needed Priority:Normal Normal Priority Issue or PR PySMO testing Issues dealing with testing of code WaterTAP WIP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants