-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix for quadratic blow-up of batch validation errors #871
Conversation
Signed-off-by: Martijn Govers <[email protected]>
Signed-off-by: Martijn Govers <[email protected]>
Signed-off-by: Martijn Govers <[email protected]>
Quality Gate passedIssues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this change prevents from appending? The original code was:
batch_errors = input_errors + id_errors
if not id_errors:
batch_errors = input_errors
# ...
The new one is:
batch_errors = input_errors + id_errors
if not id_errors:
# ...
In Python, everything is a reference, except if it's not
that means that if you append to If you then create a reference when adding the current scenario errors to the full list, then in the former case, every scenario will point to |
OK, I got it now, this line duplicates the input error for all [batch] entry |
When you do complexity analysis, it's important not to confuse N. In this case, the complexity is |
Actually no because each scenario updates the same object it references, and also a reference to the original object is used. That means that at scenario k, you also still update the errors referenced for scenario 0, and they all grow identically the same (interestingly, as a side effect, there is no additional memory overhead (linear in the amount of scenarios), but each error is referenced N times, resulting in the quadratic overhead. Even if that were not the case, and each scenario would only contain a sum of the former, you would end up in triangle numbers, which is also quadratic: 1st scenatio references 1 error, 2nd scenario references 2 errors, ..., k references k errors. Total = |
Note that this also was verified experimentally by @BartSchuurmans |
When looking at such problems the total amount of (non duplicate) errors is dominant, you don't (need to) look at the batch size. I am not saying NlogN is any better than quadratic, on the contrary, it might even lead to close to cubic complexity. |
Like I said, anything between nlogn (lower case) and cubic is to be expected. |
How did you arrive at That said: in the issue here, the amount of errors per scenario only contributes to the prefactor, not to the scaling in terms of the amount of scenarios, which is the issue described here. |
That's the only assumption we could make. I see no need to further this discussion as it does not impact the fix you did here. |
batch validation errors were appended to the same list for every scenario
This meant that if there were 2 scenarios, then the same error would be reported 2x2=4 times instead of only 2 times (once for each scenario). For 4 scenarios, that would become 16 times, etc.
Note that this may result in high costs when trying to report an error.
This was introduced in #793 (release https://github.com/PowerGridModel/power-grid-model/releases/tag/v1.9.87) in commit 3b0c6b6
Credits to @BartSchuurmans for finding this