Merkle performance improvements #233

eljobe · 2024-04-29T16:34:44Z

This change does two things:

It introduces a Merkle::extend method on the merkle tree implementation which extends the vectors which hold the leaves and their parents and then calls set on all of the hashes being appended to layer[0].
It changes the Memory::resize method to use Merkle::extend instead of throwing away the merkle tree and creating a new one.

This results in the always_merkleize strategy to be much faster than it would be without these changes. Before this change, the benchbin binary ran with metrics like these:

Running benchmark with always merkleize feature on
avg hash time 49.506µs, avg step time 49.583µs, step size 1, num_iters 100, total time 1.587369292s
avg hash time 131.402µs, avg step time 67.44µs, step size 1024, num_iters 100, total time 1.56975325s
avg hash time 529.716µs, avg step time 65.128135ms, step size 32768, num_iters 100, total time 8.108754125s
avg hash time 486.609µs, avg step time 372.639626ms, step size 1048576, num_iters 100, total time 38.849238375s

After this change, the data is more like this:

Running benchmark with always merkleize feature on
avg hash time     55.777µs, avg step time     57.294µs, step size        1, num_iters 100, total time  11.315333ms
avg hash time    126.267µs, avg step time     71.536µs, step size     1024, num_iters 100, total time  19.788125ms
avg hash time    497.955µs, avg step time   3.839355ms, step size    32768, num_iters 100, total time  433.74075ms
avg hash time    461.622µs, avg step time  50.498257ms, step size  1048576, num_iters 100, total time  5.09599725s
avg hash time    826.465µs, avg step time 676.037947ms, step size 16777216, num_iters 100, total time 67.686471417s

NOTE: With the optimization, even step-size 16,777,216 (2^24) is able to run 100 iterations in just a little bit over a minute.

references https://linear.app/offchain-labs/issue/NIT-2411/arbitrator-optimizations

Cherry-Picked from OffchainLabs/nitro@flatmerkleapril16

It can be easy to look at small average step and hash times and miss that the total time is what we're really trying to reduce.

I definitely had some incorrect assumptions about this data structure which made it more difficult to learn. So, I'm documenting how it works and adding some tests. The simple_merkle test is currently failing because the `set` method doesn't allow setting an index larger than the largest currently set leaf's index. There is some debate as to whether or not this is the correct behavior. To run the test, use: ``` $> cargo test -- --include-ignored ```

At this point, the new root hash is eagerly calculated after each call to `extend`.

If this happened frequently, it should really improve the perfomance of the machine. However, it looks like it doesn't happen at all with the benchmark inputs.

Previously, it could hit an index out of bounds if the new leafs caused any parent layer to grow beyond its current size.

Hopefully, this will allow us to compare this branch's implementation of a merkle tree to the one on merkle-perf-a.

The previous implementation was growing the same layers and dirty_indices arrays because the clone isn't deep (I guess.)

There are a few different things going on in this commit. 1. I've added some counters for when methods get called on the Merkle tree. 2. I've added integration with gperftools for profiling specific areas of the code.

This allows me to profile CPU and Heap independently, and to enable and disable the call counters independently.

This part of the code is obviously slow. Let's see if we can improve it.

This is why there were all those unexpected "new_advanced" calls on the memory merkle. The resizes were actually setting self.merkle back to None.

There was a bug where expanding the lowest layer and calling set on all of the new elements was not sufficient to grow the upper layers. This commit also fixes a warning about the package-level profile override being ineffective.

I don't think it's being used.

I have no idea why this is needed. But, it makes `make docker` successful again.

The system tests are timing out because the implementation is still too slow for large steps with lots of store and resize memory calls.

eljobe · 2024-05-02T10:34:14Z

I'm closing this PR. The implementation is only fast because it has a bug. The root hash will quite often be incorrect after the merkle tree is extended in Memory::resize.
I'll try again after lunch.

rauljordan and others added 17 commits April 21, 2024 04:19

attempt

a149ebd

Cherry-Picked from OffchainLabs/nitro@flatmerkleapril16

Make allow_merkleize a command-line switch.

dec2e82

Add the total time for each step-size.

3c30265

It can be easy to look at small average step and hash times and miss that the total time is what we're really trying to reduce.

Allow callers to extend the Merkle Tree by adding leaves.

2ecc4f5

At this point, the new root hash is eagerly calculated after each call to `extend`.

Extend the memory merkle instead of clearing it.

05b157f

If this happened frequently, it should really improve the perfomance of the machine. However, it looks like it doesn't happen at all with the benchmark inputs.

Fix merge problems from nitro cherry-picks

965a202

Remove log line about resizing memory.

477e49c

Fix the implementation of extend.

cb1d10c

Previously, it could hit an index out of bounds if the new leafs caused any parent layer to grow beyond its current size.

Add criterion benchmark for a big merkle tree.

e644b89

Hopefully, this will allow us to compare this branch's implementation of a merkle tree to the one on merkle-perf-a.

Include the creation of a fresh tree for each iteration.

e8868a8

The previous implementation was growing the same layers and dirty_indices arrays because the clone isn't deep (I guess.)

Add some profiling and instrumentation code.

1f52875

There are a few different things going on in this commit. 1. I've added some counters for when methods get called on the Merkle tree. 2. I've added integration with gperftools for profiling specific areas of the code.

Make features work

976cb36

This allows me to profile CPU and Heap independently, and to enable and disable the call counters independently.

Add a benchmark for new_advanced.

5e14a2b

This part of the code is obviously slow. Let's see if we can improve it.

Actually set the cached merkle back on the instance.

2f2e173

This is why there were all those unexpected "new_advanced" calls on the memory merkle. The resizes were actually setting self.merkle back to None.

Update the version of the enum-iterator-derive crate.

fc16ec3

Update the logic for expand to include upper layers.

1a52847

There was a bug where expanding the lowest layer and calling set on all of the new elements was not sufficient to grow the upper layers. This commit also fixes a warning about the package-level profile override being ineffective.

cla-bot bot added the s label Apr 29, 2024

eljobe added 10 commits April 30, 2024 10:25

Remove the bold submodule.

72bc92f

I don't think it's being used.

Enable always_merkelize for all uses of the merkle tree.

6f7cafa

Merge branch 'stylus' into merkle-perf

6a3c245

Make clippy happy.

36578d7

Add the fake stuff for the benchmarks to the Dockerfile.

97d93d5

I have no idea why this is needed. But, it makes `make docker` successful again.

Make clippy even happier.

e3d7cf4

Cargo fmt changes only.

521a7d8

Fix right-shifting to zero.

cf50743

Turn off always_merkelize for now.

cc71b32

The system tests are timing out because the implementation is still too slow for large steps with lots of store and resize memory calls.

Merge branch 'stylus' into merkle-perf

77866ec

eljobe closed this May 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merkle performance improvements #233

Merkle performance improvements #233

eljobe commented Apr 29, 2024

eljobe commented May 2, 2024

Merkle performance improvements #233

Merkle performance improvements #233

Conversation

eljobe commented Apr 29, 2024

eljobe commented May 2, 2024