-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add config object to keep config in sync at all times #704
base: main
Are you sure you want to change the base?
Conversation
@brynpickering this sounds really nice. I'll give a thorough look at it. One thing, based on your description of the additions (and before I check the code): perhaps we should alter things so that the mode extends the configuration, rather than hiding it? This would avoid extra work on our end down the line, and move the code of these modes towards a 'plug-in' approach. I'll share more thoughts or suggestions in a while. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a comment, for now, since the code is not 100% ready.
Generally, I like this proposal.
Some of the positives I see.
- The capacity in inherit validation models could do wonders to streamline our code. In particular, it could enable us to make non-standard modes 'plug-ins' that people can choose to use.
- With a bit of context,
pydantic
makes the configuration quite easy to follow. It giving intellisense suggestions is very nice too.
Some concerns, though:
- I see this approach as a duplicate of yaml schemas. We should only use one. Keeping both only makes the code harder to understand, imo.
- I do not think this really solves Invalid configuration showing for un-activated modes due to schema defaults #626. The configuration is still 'tangled'. But it does provide a way to solve it.
- More of an open question than a concern: would
pydantic
help in our efforts to make parameters and dimensions part of the input yaml files?
src/calliope/config.py
Outdated
@model_validator(mode="before") | ||
@classmethod | ||
def update_solve_mode(cls, data): | ||
"""Solve mode should match build mode.""" | ||
data["solve"]["mode"] = data["build"]["mode"] | ||
return data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to do nothing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you do model.config.update({"build": {"mode": "operate"}})
it will update the solve "mode" too. The solve "mode" is really just a tracker of the build mode, and this ensures they are always in sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant it as 'this is not used anywhere in our code'. But it seems like its an update to pydantic
behaviour (fancy!).
Having mode
in two different places is an issue, though. Even if pydantic
ensures they are equal, it introduces unnecessary ambiguity.
Model.solve()
seems to update solve.mode
, and then immediately use it, and then it's used nowhere else after.
kwargs["mode"] = self.config.build.applied_keyword_overrides.get(
"mode", self.config.build.mode
)
this_solve_config = self.config.update({"solve": kwargs}).solve
self._model_data.attrs["timestamp_solve_start"] = log_time(
LOGGER,
self._timings,
"solve_start",
comment=f"Optimisation model | starting model in {this_solve_config.mode} mode.",
)
shadow_prices = this_solve_config.shadow_prices
self.backend.shadow_prices.track_constraints(shadow_prices)
if this_solve_config.mode == "operate":
results = self._solve_operate(**this_solve_config.model_dump())
else:
results = self.backend._solve(
warmstart=warmstart, **this_solve_config.model_dump()
)
If we just used the value build.mode
within Model.solve()
, all this code is unnecessary.
Perhaps ensuring that Model._is_built
is automatically set to False
if the mode is ever updated is a better approach?
* Removed mode redundancy to simplify the configuration * Simplify config schema extraction
I think moving to pydantic is perfectly fine if there are clear benefits - it seems like two of these are better maintainability, and making it easier to handle optional additional configuration. Can we limit additional changes to the schema (e.g. nesting) though? |
@brynpickering @sjpfenninger @FLomb Before continuing this PR, we need to decide how to handle updates to the model configuration via Our current approach is to handle them as "ephemeral", meaning that we essentially forget them afterwards. This introduces several issues:
More arguments can be found here. My proposal is to always treat
|
That could be difficult to achieve without lots of extra work, but sure.
I disagree with this description of the current method. It is containerising the config for a specific build, so kwargs applied on build/solve are stored separately and then wiped on re-build/re-solve.
agree
This is easy enough to solve with methods that force application of the containerised config in certain circumstance.
Less so than an out-of-sync config with no way of even seeing what existed in the config before calling "reset".
Disagree. Different libraries take different approaches to this, but many pass back a copy of the input with the changes, rather than making changes in-place on update. The current approach returns a copy of the config with the changes applied, this is the containerised, temporary config that is used. The only addition is that we store these changes so that a user can easily see the diff between the original and the containerised config. The "reset" method is fine as an alternative except that I would not just say "and now we save with the wrong config but the user will understand that because they'll get a warning that it's out-of-sync". Instead, even if the user has reset the config, if they save the model with the current set of results then the config associated with those results is the one that is saved. Being able to access results with an out-of-sync config just doesn't work in practice. |
Thanks for the feedback @brynpickering !
I think we are saying the same thing here, kind of... The problem is that 'in a certain circumstance' implies that the programmer must know which of the two (or more) methods should be called in one case. This ambiguity is what I am calling bug-prone, and does not exist if you have everything in one place with just one access method.
I was not referring to how the new pydantic schema works, but to how model.build(backend='pyomo') They do not follow the behaviour you describe (they return If someone unfamiliar with
I agree on this. We should disallow results in saved files if this mismatch exists, then? |
One last clarification, @brynpickering.
I definitely think this is the way to go when it comes to updates to the config. All the above is more about how many config-relevant objects we should have, and where. Not necessarily about how said object is updated (we agree there), or if we should have a reset method (we could do away with it entirely). |
I'm asking this because I think it's not ideal if implementation details keep affecting how the config looks. Can we ex-ante look at all changes would and decide then, considering the extra work as a factor? And obviously settling onto something that will stay stable for the foreseeable future then. |
I'm not sure that's the case. I would actually expect a kwarg passed to a function like this to NOT modify an underlying configuration. If the function signature were something like: So is maybe the implicit use of kwargs in a possibly confusing way one of the issues here? Though, is this "containerised/ephemeral" config updating important enough to warrant the complexity of dealing with it? Would it be better (less code to maintain) to have only ever have a single configuration and document (or even warn) that kwargs update that configuration? |
I will attempt to create a summarized proposal of how I see |
Here are diagrams for all proposed alternatives as I see them. I tried to be fair in the pros/cons I see. Overall:
All three cases are fine, IMHO. However, my only request is that if the containerised case is chosen, no other 'extra' functionality should be in the containerised/ephemeralThis is the closest to what @brynpickering initially proposed.
As long as we always call resetThis is the closest to what I initially proposed.
This one is only fine if we are willing to pass the responsibility of resetting to the users. simpleThis is the closest to what @sjpfenninger suggested above.
This puts the most trust on the user and slims our code. The most KISS / YAGNI of all cases. |
@irm-codebase My perspective on it (some overlap, some places where I disagree with your interpretation). In each, I give two examples of experiments, demonstrating the approach in action: 1. pyomo backend with gurobi solver (default) -> gurobi backend -> pyomo backend with CBC solver. 2. add constraint to math -> update constraint -> no contraint. containerised
Example 1
m = calliope.Model(...)
m.build()
m.solve()
m.to_netcdf(...)
m.build(backend="gurobi", force=True)
m.solve()
m.to_netcdf(...)
m.build(force=True) # backend=pyomo is the default that it has reset to
m.solve(solver="cbc")
m.to_netcdf(...) Example 2
m = calliope.Model(...)
m.build(add_math={"constraints": {"foo": {<whole-constraint>}})
m.solve()
m.to_netcdf(...)
m.build(add_math={"constraints": {"foo": {<whole-constraint-with-update>}}, force=True)
m.solve()
m.to_netcdf(...)
m.build(force=True) # math is automatically reset to having no added constraint
m.solve()
m.to_netcdf(...) reset
Example 1
No actual need for m = calliope.Model(...)
m.build()
m.solve()
m.to_netcdf(...)
m.build(backend="gurobi", force=True)
m.solve()
m.to_netcdf(...)
m.build(backend="pyomo", force=True)
m.solve(solver="cbc")
m.to_netcdf(...) Example 2
m = calliope.Model(...)
m.build(add_math={"constraints": {"foo": {<whole-constraint>}})
m.solve()
m.to_netcdf(...)
m.build(add_math={"constraints": {"foo": {<partial-constraint-with-updates>}}, force=True)
m.solve()
m.to_netcdf(...)
m.reset()
m.build()
m.solve()
m.to_netcdf(...) override
Example 1
Can override without needing to re-init. m = calliope.Model(...)
m.build()
m.solve()
m.to_netcdf(...)
m.build(backend="gurobi", force=True)
m.solve()
m.to_netcdf(...)
m.build(backend="pyomo", force=True)
m.solve(solver="cbc")
m.to_netcdf(...) Example 2
m = calliope.Model(...)
m.build(add_math={"constraints": {"foo": {<whole-constraint>}})
m.solve()
m.to_netcdf(...)
m.build(add_math={"constraints": {"foo": {<partial-constraint-with-updates>}}, force=True)
m.solve()
m.to_netcdf(...)
# Can't undo the addition of a constraint to the math, so we have to re-init.
m = calliope.Model(...)
m.build()
m.solve()
m.to_netcdf(...) |
Using "simplicity in implementation" and "avoiding hidden magic" as the main two criteria, and looking at Bryn's detailed examples, option 3 seems preferable - unless we can think of a strong reason why initial configuration needs to be recoverable or we want to preserve the current behavior? What we could add is to raise a warning when kwargs are passed to |
I agree with @sjpfenninger in that the most simple approach might be preferable here. The one case were this might cause pain is not being able to remove temporary math. In #726 I made a suggestion to 'mix and match' math files. Hopefully it lessens this issue. @brynpickering which would be your preference? Is the pure override case ok for you? |
@brynpickering while debugging failing test cases I've discovered a problem in the 'containerised' case. It happens in the model = build_model({}, "simple_supply,operate,var_costs,investment_costs") # build.mode is 'operate'
model.build(mode="plan")
model.solve() # Fail!
model.build(
force=True,
mode="operate",
operate={"use_cap_results": True, "window":request.param[0], "horizon":request.param[1]}
) The main issue is that # within Model.solve
solve_config = self.config.update({"solve": kwargs}).solve # previous build kwargs are lost!
# this will only see {"solve": kwargs} , mode='plan' is lost !
mode = self.config.update(self.config.applied_keyword_overrides).build.mode
# mode is 'operate' :( This has important implications:
The alternative is making So, both the 'reset' and the 'containerised' case are actually quite similar, in the end. It's just a matter of where / how we save the |
- Model 'build' and 'solve' no longer use a 'temporary' configuration - Simplified the pydantic configuration schema, and made it fully 'frozen' and non extensible - Moved data table loading into the Data generation portion - Fixed most tests
@brynpickering, @sjpfenninger Here is a summary of the changes:
Did not implement:
Future improvements:
|
table_name (str): name of the data table. | ||
data_table (DataTableDict): Data table definition dictionary. | ||
data_table_dfs (dict[str, pd.DataFrame] | None, optional): | ||
If given, a dictionary mapping table names in `data_table` to in-memory pandas DataFrames. | ||
Defaults to None. | ||
model_definition_path (Path | None, optional): | ||
model_definition_path (Path, optional): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be str | Path | None, optional
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember removing some of these due to mypy complaining about incorrect typings, but I am unsure about this one. If running mypy check
does not result in complaints, either is fine.
@irm-codebase : one thing that has now happened is that it doesn't fix #626 (all mode configs are shown in the config repr)! What is the plan to mitigate this, if not in this PR? |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #704 +/- ##
==========================================
+ Coverage 96.09% 96.22% +0.13%
==========================================
Files 29 30 +1
Lines 4067 4181 +114
Branches 584 591 +7
==========================================
+ Hits 3908 4023 +115
Misses 68 68
+ Partials 91 90 -1
|
Not in this PR for sure... We'll have to keep that one there for now. As long as these modes are 'within' the |
class Solve(ConfigBaseModel): | ||
"""Base configuration options used when solving a Calliope optimisation problem (`calliope.Model.solve`).""" | ||
|
||
model_config = {"title": "Model Solve Configuration"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brynpickering one of the reasons I removed these model_config=...
was that they override the settings given in ConfigBaseModel
. If we do want these titles, we have to copy-paste the following in all cases. The titles did not seem worth the extra lines, and that's why I took them out.
model_config = {
"extra": "forbid",
"frozen": True,
"revalidate_instances": "always",
"use_attribute_docstrings": True,
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oddly enough, they don't override the ConfigBaseModel
config. You can check it out for yourself (or maybe we should add a test to ensure that remains the case)!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh. Good to know!
I looked into the older version of the code, and it seems like the cause was that one of the subclasses omitted "frozen". I assumed that meant you had to specify it every time.
The basic setup looks sound to me. I think it's good that the docs explicitly explain that overrides are executed after |
@brynpickering I've added a few tests to satisfy the codecov check. Mostly testing new error messages (and some old ones that had no checks). Should be enough to satisfy everything needed for this PR. |
Fixes #626
Partially fixes #619
Thought I'd already upload this so you can contribute to the attempt @irm-codebase. Tests haven't been cleaned up so I expect many will fail.
I'm quite liking pydantic @sjpfenninger. I know we questioned using it some time ago, but now that I've spent more time with it I do wonder whether it might make other parts of the code and input validation cleaner...
Summary of changes in this pull request
model.config.model_dump(exclude_defaults=True)
)build
andsolve
steps have isolated configs that account for ad-hockwargs
, which are stored in the config class asapplied_keyword_overrides
(might want something snappier)operate_[...]
andspores_[...]
config options have returned to being sub-dicts (build.operate.[...]
andbuild.spores.[...]
) so the options can be easily isolated and passed around as necessaryupdate
method, which returns an updated config object, but keeps the base config object unchanged except for content inapplied_keyword_overrides
. So you can't change config options accidentally (e.g.model.config.init.name = "new_name"
won't work).Reviewer checklist