-
Notifications
You must be signed in to change notification settings - Fork 646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PR benchmarking #750
Add PR benchmarking #750
Conversation
3484164
to
815a462
Compare
Please also consider to document/explain the results in the markdown table. Is it possible to mark the workflow check as failed if the performance diff > a threshold, i.e. 5%? |
Hi @ahrtr, sorry I forgot to say that this was in a PoC state, I was just looking to get the desired output. I'll clean this up when we decide what direction we want to follow. |
7f35340
to
ad4a6ae
Compare
I've cleaned up the code and squashed commits. It's very similar to the original implementation from #691 but adds some configuration and timeout so the job won't fail. However, what doesn't convince me is that for the same conditions, the op/sec seems to be highly variable, for example, the latest run (https://github.com/etcd-io/bbolt/actions/runs/9122610992). Without changes to the code, the difference is 4.51%.
I could scan the results, check for a threshold, and make the check fail. However, see my previous paragraph. I don't know if this benchmark (or at least op/sec) is accurate (or representative). I noticed locally that the more runs, the better and the more accurate the results. However, Go benchmarks in this repository are slow. Running a count of 5 (what Benchstat suggests for 95% confidence) takes about 45 minutes per run (meaning twice for checking the base and head for a PR). Would you say this is good enough for a first approach at benchmarking, @ahrtr? Document the markdown table, add a threshold, and get it going? |
Thanks for the work. Overall looks good to me. We can continue to enhance or revert the change if we see any big problem in future. |
@ahrtr, do you have any suggestion where to document the table? I was about to add it as part of the output, but I'm not sure if you think it will be better just in the code. |
It works for me. |
fe29042
to
1250765
Compare
@ahrtr, could you PTAL at this. I set it as ready for review. I worked on what we agreed, and the results for this PR are here: https://github.com/etcd-io/bbolt/actions/runs/9229526551?pr=750 |
that's neat, can we get it to post the resulting MD table to a comment as well? |
@tjungblu, I was researching a GitHub action that can comment (and update the original comment for new runs). I think it would be even better if we add a prow command (or a GitHub label) and just run these benchmarks if the label exists... However, I want to get first this PR merged, then add the next round of improvements :) |
Please hold reviewing before I address the build failure |
also quickly cobbled together the YCSB in pingcap/go-ycsb#300 - also really interesting to see the perf differences between boltdb and bbolt :) Might be another alternative to bench cmd, because the workloads are fairly well defined. |
I like that YCSB has a defined set of benchmarks to run. However, the reasons why I would prefer to use our
But if you guys feel that's a better direction, we can pursue that path :) |
I don't mind tbh, in the end we should choose something representative for etcd and k8s as a benchmark profile. |
I think you have also just exposed that we don't benchmark deletes 😅. |
I tend to use our own
|
I did a PoC with our Running 10 iterations, with no changes in the source code, the results are:
Running 10 iterations against the commit before merging #741
|
Looks good. Regarding the parameters, please refer to #739 (comment). For example,
|
@ahrtr, I updated the pull request with these changes and to use our |
This adds benchmarking using cmd/bbolt's bench, inspired on what it's used in kube-state-matrics. Co-authored-by: Manuel Rüger <[email protected]> Signed-off-by: Ivan Valdes <[email protected]> wip Signed-off-by: Ivan Valdes <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm |
LGTM @ivanvc Thank you. It looks good to me.
|
The code comes from @ivanvc in etcd-io/bbolt#750
The code comes from ivanvc in etcd-io/bbolt#750
The code comes from ivanvc in etcd-io/bbolt#750
The code comes from ivanvc in etcd-io/bbolt#750
The code comes from ivanvc in etcd-io/bbolt#750
This PR introduces benchmarking in a PR, comparing the results vs. the base of that PR. It currently displays the results in the Job summary. If the performance difference is greater than 5%, it will make the job fail.
Supersedes #691.