You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The easysync tests contain some randomness. This introduces some non-determinism so that from time to time total code coverage differs between multiple runs.
Should I compile a set of inputs and hard code them so that every time the suite runs the exact same input is used?
Another improvement: Should I try to find a way to get performance metrics out of our workflows? I already described the approach in #4341 (comment) (I have the feeling we discussed this in another issue already, but I didn't find it) #5233 could have some performance impact. On browsers it probably doesn't matter that much. My plan is to calculate some millions of inputs from the existing test suite maybe with slightly adjusted boundaries.
run easysync suite as backend test (whole suite, not individual tests)
record metrics for all single tests
export results to Github page
To get started, we can run the suite 1000 times.
configure a cronjob to repeat this maybe every 4 hours for 1 or 2 days
Now we have a baseline for every test.
add a workflow, that checks out the baseline commit, runs easysync suite maybe 10 times and calculates the overall (relative) deviation (ie sum of all test durations compared to baseline sum)
check out commit that should be tested and run the suite multiple times, calculate metrics "normalized" with deviation from above and upload the result
I wouldn't do any fail/success until we know that it works reliably, but consider all of this informative
The text was updated successfully, but these errors were encountered:
Should I compile a set of inputs and hard code them so that every time the suite runs the exact same input is used?
Yes. At the very least we should use a RNG with a fixed seed so that the results are reproducible.
Another improvement: Should I try to find a way to get performance metrics out of our workflows? I already described the approach in #4341 (comment)
Automated performance regression testing is really difficult to do properly, and requires lots of maintenance. Until we are bitten by performance regressions, I think our time is best spent elsewhere.
(I have the feeling we discussed this in another issue already, but I didn't find it)
#5233 could have some performance impact. On browsers it probably doesn't matter that much. My plan is to calculate some millions of inputs from the existing test suite maybe with slightly adjusted boundaries.
run easysync suite as backend test (whole suite, not individual tests)
record metrics for all single tests
export results to Github page
To get started, we can run the suite 1000 times.
configure a cronjob to repeat this maybe every 4 hours for 1 or 2 days
Now we have a baseline for every test.
add a workflow, that checks out the baseline commit, runs easysync suite maybe 10 times and calculates the overall (relative) deviation (ie sum of all test durations compared to baseline sum)
check out commit that should be tested and run the suite multiple times, calculate metrics "normalized" with deviation from above and upload the result
I wouldn't do any fail/success until we know that it works reliably, but consider all of this informative
I would love to see that done, but it is quite a bit of work. My plan was to just release the changes and see if anyone complains about a drop in performance. 🙂
The easysync tests contain some randomness. This introduces some non-determinism so that from time to time total code coverage differs between multiple runs.
Should I compile a set of inputs and hard code them so that every time the suite runs the exact same input is used?
Another improvement: Should I try to find a way to get performance metrics out of our workflows? I already described the approach in #4341 (comment) (I have the feeling we discussed this in another issue already, but I didn't find it)
#5233 could have some performance impact. On browsers it probably doesn't matter that much. My plan is to calculate some millions of inputs from the existing test suite maybe with slightly adjusted boundaries.
To get started, we can run the suite 1000 times.
Now we have a baseline for every test.
The text was updated successfully, but these errors were encountered: