Add PR benchmarking #750

ivanvc · 2024-05-07T13:42:39Z

This PR introduces benchmarking in a PR, comparing the results vs. the base of that PR. It currently displays the results in the Job summary. If the performance difference is greater than 5%, it will make the job fail.

Supersedes #691.

.github/workflows/compare-benchmarks.yaml

Makefile

ahrtr · 2024-05-08T16:18:34Z

Please also consider to document/explain the results in the markdown table.

Is it possible to mark the workflow check as failed if the performance diff > a threshold, i.e. 5%?

ivanvc · 2024-05-08T16:58:21Z

Hi @ahrtr, sorry I forgot to say that this was in a PoC state, I was just looking to get the desired output. I'll clean this up when we decide what direction we want to follow.

ivanvc · 2024-05-17T05:18:08Z

I've cleaned up the code and squashed commits. It's very similar to the original implementation from #691 but adds some configuration and timeout so the job won't fail.

However, what doesn't convince me is that for the same conditions, the op/sec seems to be highly variable, for example, the latest run (https://github.com/etcd-io/bbolt/actions/runs/9122610992). Without changes to the code, the difference is 4.51%.

Is it possible to mark the workflow check as failed if the performance diff > a threshold, i.e. 5%?

I could scan the results, check for a threshold, and make the check fail. However, see my previous paragraph. I don't know if this benchmark (or at least op/sec) is accurate (or representative).

I noticed locally that the more runs, the better and the more accurate the results. However, Go benchmarks in this repository are slow. Running a count of 5 (what Benchstat suggests for 95% confidence) takes about 45 minutes per run (meaning twice for checking the base and head for a PR).

Would you say this is good enough for a first approach at benchmarking, @ahrtr? Document the markdown table, add a threshold, and get it going?

ahrtr · 2024-05-17T06:20:02Z

Would you say this is good enough for a first approach at benchmarking, @ahrtr? Document the markdown table, add a threshold, and get it going?

Thanks for the work. Overall looks good to me.

We can continue to enhance or revert the change if we see any big problem in future.

ivanvc · 2024-05-23T17:57:10Z

@ahrtr, do you have any suggestion where to document the table? I was about to add it as part of the output, but I'm not sure if you think it will be better just in the code.

ahrtr · 2024-05-23T18:47:02Z

I was about to add it as part of the output

It works for me.

ivanvc · 2024-05-24T21:29:39Z

@ahrtr, could you PTAL at this. I set it as ready for review. I worked on what we agreed, and the results for this PR are here: https://github.com/etcd-io/bbolt/actions/runs/9229526551?pr=750

Makefile

scripts/compare_benchmarks.sh

.github/workflows/benchmark-pr.yaml

tjungblu · 2024-05-27T07:35:25Z

that's neat, can we get it to post the resulting MD table to a comment as well?

Makefile

scripts/compare_benchmarks.sh

ivanvc · 2024-05-28T23:44:31Z

that's neat, can we get it to post the resulting MD table to a comment as well?

@tjungblu, I was researching a GitHub action that can comment (and update the original comment for new runs). I think it would be even better if we add a prow command (or a GitHub label) and just run these benchmarks if the label exists... However, I want to get first this PR merged, then add the next round of improvements :)

ivanvc · 2024-06-01T15:52:20Z

Please hold reviewing before I address the build failure

tjungblu · 2024-06-06T14:39:18Z

also quickly cobbled together the YCSB in pingcap/go-ycsb#300 - also really interesting to see the perf differences between boltdb and bbolt :)

Might be another alternative to bench cmd, because the workloads are fairly well defined.
We would need to checkout their repo though, update the go.mod to point to a given commit hash/branch and rebuild + run from there. Comparison is more manual with it too...

ivanvc · 2024-06-06T22:22:56Z

I like that YCSB has a defined set of benchmarks to run. However, the reasons why I would prefer to use our bench command are:

With add gobench output option #765, it's trivial to add benchstat and benefit from using it to generate the statistical summary. As pointed out, with go-ycsb, we'll need to build it manually.
The setup to have go-ycsb working with a specific commit sounds more intricate.

But if you guys feel that's a better direction, we can pursue that path :)

tjungblu · 2024-06-07T08:13:46Z

I don't mind tbh, in the end we should choose something representative for etcd and k8s as a benchmark profile.
Just noticed that YCSB doesn't really have a delete component, that's important for us because of the events/leases and fragmentation that we should account for.

ivanvc · 2024-06-07T16:49:34Z

Just noticed that YCSB doesn't really have a delete component.

I think you have also just exposed that we don't benchmark deletes 😅.

ahrtr · 2024-06-08T12:17:42Z

I tend to use our own bench,

We have more flexibility to change & tune the test.
I had a quick review on add bbolt pingcap/go-ycsb#300, each bbolt TXN has only one write(put) operation. Our bench support multiple write operations in each transaction.

ivanvc · 2024-06-15T06:30:58Z

I did a PoC with our bench. Using the first benchmark identified by #720 (params: -profile-mode n -count 100000 -batch-size 25000). It looks more promising than using go test -bench.

Running 10 iterations, with no changes in the source code, the results are:

        │    BASE     │                HEAD                │
        │   sec/op    │   sec/op     vs base               │
Write     726.8n ± 3%   720.1n ± 4%       ~ (p=0.315 n=10)
Read      8.134n ± 3%   8.110n ± 0%       ~ (p=0.280 n=10)
geomean   76.89n        76.42n       -0.61%

Running 10 iterations against the commit before merging #741

        │    BASE     │                HEAD                 │
        │   sec/op    │   sec/op     vs base                │
Write     865.2n ± 5%   713.2n ± 3%  -17.56% (p=0.000 n=10)
Read      8.726n ± 5%   8.682n ± 4%        ~ (p=0.436 n=10)
geomean   86.89n        78.69n        -9.43%

ahrtr · 2024-06-24T13:49:56Z

Looks good. Regarding the parameters, please refer to #739 (comment). For example,

options.BatchSize: 10,000 (10K)
options.Iterations: 2,000,000 (2M)

ivanvc · 2024-06-26T23:45:49Z

Looks good. Regarding the parameters, please refer to #739 (comment). For example,
* options.BatchSize: 10,000 (10K)

* options.Iterations: 2,000,000 (2M)

@ahrtr, I updated the pull request with these changes and to use our cmd/bbolt bench. This is the summary for the job from this PR: https://github.com/etcd-io/bbolt/actions/runs/9687730093?pr=750

scripts/compare_benchmarks.sh

.github/workflows/benchmark-pr.yaml

scripts/compare_benchmarks.sh

This adds benchmarking using cmd/bbolt's bench, inspired on what it's used in kube-state-matrics. Co-authored-by: Manuel Rüger <[email protected]> Signed-off-by: Ivan Valdes <[email protected]> wip Signed-off-by: Ivan Valdes <[email protected]>

ahrtr

Great work!

Thank you! @ivanvc

cc @fuweid @tjungblu to take a look

tjungblu · 2024-06-28T07:18:29Z

/lgtm
as well, thanks @ivanvc - let's see what we can find with it in the other PRs :)

fuweid · 2024-07-02T09:03:39Z

LGTM

@ivanvc Thank you. It looks good to me.

        │    BASE     │                HEAD                │
        │   sec/op    │   sec/op     vs base               │
Write     1.133µ ± 3%   1.129µ ± 3%       ~ (p=0.540 n=10)
Read      16.68n ± 1%   16.73n ± 1%       ~ (p=0.402 n=10)
geomean   137.5n        137.4n       -0.01%

@ivanvc

The code comes from @ivanvc in etcd-io/bbolt#750

The code comes from ivanvc in etcd-io/bbolt#750

k8s-ci-robot added the do-not-merge/work-in-progress label May 7, 2024

ivanvc force-pushed the add-pr-benchmark branch 2 times, most recently from 3484164 to 815a462 Compare May 8, 2024 03:01

ahrtr reviewed May 8, 2024

View reviewed changes

.github/workflows/compare-benchmarks.yaml Outdated Show resolved Hide resolved

Makefile Outdated Show resolved Hide resolved

ahrtr mentioned this pull request May 8, 2024

Add benchmark tooling #691

Closed

ivanvc force-pushed the add-pr-benchmark branch 2 times, most recently from 7f35340 to ad4a6ae Compare May 17, 2024 03:41

ivanvc force-pushed the add-pr-benchmark branch from ad4a6ae to 759a097 Compare May 23, 2024 04:54

ivanvc force-pushed the add-pr-benchmark branch 3 times, most recently from fe29042 to 1250765 Compare May 24, 2024 20:23

ivanvc changed the title ~~[do not merge] Add pr benchmark~~ Add PR benchmarking May 24, 2024

ivanvc marked this pull request as ready for review May 24, 2024 21:28

k8s-ci-robot removed the do-not-merge/work-in-progress label May 24, 2024

ivanvc requested a review from ahrtr May 24, 2024 21:29

ahrtr reviewed May 27, 2024

View reviewed changes

Makefile Outdated Show resolved Hide resolved

scripts/compare_benchmarks.sh Outdated Show resolved Hide resolved

.github/workflows/benchmark-pr.yaml Outdated Show resolved Hide resolved

tjungblu reviewed May 27, 2024

View reviewed changes

Makefile Show resolved Hide resolved

tjungblu reviewed May 27, 2024

View reviewed changes

scripts/compare_benchmarks.sh Outdated Show resolved Hide resolved

ivanvc force-pushed the add-pr-benchmark branch from 1250765 to b53f11a Compare May 31, 2024 23:15

ivanvc force-pushed the add-pr-benchmark branch from b53f11a to b613bc6 Compare June 2, 2024 03:24

ivanvc mentioned this pull request Jun 11, 2024

cmd/bbolt: write bench results to stdout #767

Merged

ahrtr mentioned this pull request Jun 26, 2024

Introduce freelist interface #773

Closed

ivanvc force-pushed the add-pr-benchmark branch from 1250765 to 4081f93 Compare June 26, 2024 22:38

k8s-ci-robot added the size/L label Jun 26, 2024

ivanvc requested a review from ahrtr June 26, 2024 23:44

ahrtr reviewed Jun 27, 2024

View reviewed changes

scripts/compare_benchmarks.sh Show resolved Hide resolved

.github/workflows/benchmark-pr.yaml Outdated Show resolved Hide resolved

.github/workflows/benchmark-pr.yaml Outdated Show resolved Hide resolved

ahrtr reviewed Jun 27, 2024

View reviewed changes

scripts/compare_benchmarks.sh Outdated Show resolved Hide resolved

scripts/compare_benchmarks.sh Outdated Show resolved Hide resolved

Add benchmark tooling

ac4f755

This adds benchmarking using cmd/bbolt's bench, inspired on what it's used in kube-state-matrics. Co-authored-by: Manuel Rüger <[email protected]> Signed-off-by: Ivan Valdes <[email protected]> wip Signed-off-by: Ivan Valdes <[email protected]>

ivanvc requested review from ahrtr and tjungblu June 27, 2024 22:27

ivanvc force-pushed the add-pr-benchmark branch from 4081f93 to ac4f755 Compare June 27, 2024 22:29

ahrtr approved these changes Jun 28, 2024

View reviewed changes

ahrtr merged commit cc78a5a into etcd-io:main Jun 28, 2024
17 checks passed

ivanvc deleted the add-pr-benchmark branch June 28, 2024 15:51

ivanvc mentioned this pull request Jun 28, 2024

Introduce benchmarking for PRs #739

Open

mrueg added a commit to mrueg/kube-state-metrics that referenced this pull request Nov 6, 2024

chore: Integrate benchmark output in GHA

5d56208

The code comes from @ivanvc in etcd-io/bbolt#750

mrueg added a commit to mrueg/kube-state-metrics that referenced this pull request Nov 6, 2024

chore: Integrate benchmark output in GHA

b5cf183

The code comes from ivanvc in etcd-io/bbolt#750

mrueg added a commit to mrueg/kube-state-metrics that referenced this pull request Nov 6, 2024

chore: Integrate benchmark output in GHA

f69db17

The code comes from ivanvc in etcd-io/bbolt#750

mrueg added a commit to mrueg/kube-state-metrics that referenced this pull request Nov 6, 2024

chore: Integrate benchmark output in GHA

49ea969

The code comes from ivanvc in etcd-io/bbolt#750

mrueg added a commit to mrueg/kube-state-metrics that referenced this pull request Nov 6, 2024

chore: Integrate benchmark output in GHA

0303532

The code comes from ivanvc in etcd-io/bbolt#750

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add PR benchmarking #750

Add PR benchmarking #750

ivanvc commented May 7, 2024 •

edited

Loading

ahrtr commented May 8, 2024

ivanvc commented May 8, 2024

ivanvc commented May 17, 2024

ahrtr commented May 17, 2024

ivanvc commented May 23, 2024

ahrtr commented May 23, 2024

ivanvc commented May 24, 2024

tjungblu commented May 27, 2024

ivanvc commented May 28, 2024

ivanvc commented Jun 1, 2024

tjungblu commented Jun 6, 2024

ivanvc commented Jun 6, 2024

tjungblu commented Jun 7, 2024

ivanvc commented Jun 7, 2024

ahrtr commented Jun 8, 2024

ivanvc commented Jun 15, 2024

ahrtr commented Jun 24, 2024

ivanvc commented Jun 26, 2024

ahrtr left a comment

tjungblu commented Jun 28, 2024

fuweid commented Jul 2, 2024

Add PR benchmarking #750

Add PR benchmarking #750

Conversation

ivanvc commented May 7, 2024 • edited Loading

ahrtr commented May 8, 2024

ivanvc commented May 8, 2024

ivanvc commented May 17, 2024

ahrtr commented May 17, 2024

ivanvc commented May 23, 2024

ahrtr commented May 23, 2024

ivanvc commented May 24, 2024

tjungblu commented May 27, 2024

ivanvc commented May 28, 2024

ivanvc commented Jun 1, 2024

tjungblu commented Jun 6, 2024

ivanvc commented Jun 6, 2024

tjungblu commented Jun 7, 2024

ivanvc commented Jun 7, 2024

ahrtr commented Jun 8, 2024

ivanvc commented Jun 15, 2024

ahrtr commented Jun 24, 2024

ivanvc commented Jun 26, 2024

ahrtr left a comment

Choose a reason for hiding this comment

tjungblu commented Jun 28, 2024

fuweid commented Jul 2, 2024

ivanvc commented May 7, 2024 •

edited

Loading