Aida's tx-generator test ended with panic #763

wsodsong · 2024-02-16T03:35:40Z

An Aida's tx-generator test ended with panic: unable to store branch node with dirty hash error. Jenkins.

In this test we run store tx type only with 50,000 tx per block for 100 blocks using London fork. Command (develop branch):

build/aida-vm-sdb tx-generator --db-impl carmen --db-variant go-file --carmen-schema 5 --tx-type store --block-length 50000 london london+100

Error message:

 panic: unable to store branch node with dirty hash
 
 goroutine 134 [running]:
 github.com/Fantom-foundation/Carmen/go/state/mpt.BranchNodeEncoderWithChildHashes.Store({}, {0xc0001a3680?, 0x262?, 0x1809be0?}, 0x425b01?)
 	/home/jenkins/workspace/Aida/ReleaseTesting/FunctionalTests/F06/carmen/go/state/mpt/nodes.go:2064 +0x209
 github.com/Fantom-foundation/Carmen/go/backend/stock/file.(*fileStock[...]).Set(_, _, {{{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...}, ...}, ...})
 	/home/jenkins/workspace/Aida/ReleaseTesting/FunctionalTests/F06/carmen/go/backend/stock/file/file.go:245 +0xca
 github.com/Fantom-foundation/Carmen/go/backend/stock/synced.(*syncedStock[...]).Set(_, _, {{{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...}, ...}, ...})
 	/home/jenkins/workspace/Aida/ReleaseTesting/FunctionalTests/F06/carmen/go/backend/stock/synced/synced.go:39 +0xe2
 github.com/Fantom-foundation/Carmen/go/state/mpt.(*Forest).flushNode(0x0?, 0x42a464?, {0x1e77390?, 0xc0c65a3000?})
 	/home/jenkins/workspace/Aida/ReleaseTesting/FunctionalTests/F06/carmen/go/state/mpt/forest.go:784 +0x1e5
 github.com/Fantom-foundation/Carmen/go/state/mpt.writeBufferSink.Write({0x18ad000?}, 0xc0002aa090?, {{0xc0c65a4040}})
 	/home/jenkins/workspace/Aida/ReleaseTesting/FunctionalTests/F06/carmen/go/state/mpt/forest.go:921 +0x25
 github.com/Fantom-foundation/Carmen/go/state/mpt.(*writeBuffer).emptyBuffer(0xc0001f0200)
 	/home/jenkins/workspace/Aida/ReleaseTesting/FunctionalTests/F06/carmen/go/state/mpt/write_buffer.go:198 +0x354
 github.com/Fantom-foundation/Carmen/go/state/mpt.makeWriteBuffer.func1()
 	/home/jenkins/workspace/Aida/ReleaseTesting/FunctionalTests/F06/carmen/go/state/mpt/write_buffer.go:94 +0xe5
 created by github.com/Fantom-foundation/Carmen/go/state/mpt.makeWriteBuffer in goroutine 1
 	/home/jenkins/workspace/Aida/ReleaseTesting/FunctionalTests/F06/carmen/go/state/mpt/write_buffer.go:88 +0x1ab

The text was updated successfully, but these errors were encountered:

HerbertJordan · 2024-02-16T10:35:54Z

It looks like the test scenario that failed on Jenkins was running for 14+ hours for processing a single block (37534834) and got killed before completing it.

I expect what has happen in this case was that the entire 14h of execution all modifications have just been made based on the data cached in the StateDB instance (the component handling transaction contexts and sitting between the EVM and the actual database). When finally the block was completed and to be committed, all the data in the StateDB cache was pushed into the DB, exceeding internal buffer limits for working-set sizes for a single commit.

Right now, there is unfortunately an upper limit of the working set size defined by the capacity of the LiveDB node cache size. If this limit is exceeded, the program is crashing with a dirty-hash panic.

The default working set is enough for at least ~500k modifications within a single block. The store application of Norma, however, causes roughly 260 updates per transaction. Thus, the 50.000 Tx per block would produce up to 1.3 million updates, exceeding the per-block limit which has likely triggered the issue.

Things we should do:

@wsodsong can you confirm that the block progress report of aida-vm-sdb is correct when using the tx-generator mode and that it was indeed just processing a single block?
reduce parameter configurations to something within the limits of ~500k updates per block
investigate the possibility of eliminating this maximum working set size constraint

wsodsong · 2024-02-16T10:48:33Z

@HerbertJordan I can confirm that the log is correct. One block has 50,000 transactions in this test. I made a quick calculation from the processing rates reported. It seems that the panic happens right at the end of the first block.

HerbertJordan · 2024-02-16T12:57:26Z

@wsodsong thanks for checking. In this case I would suggest to reduce the block size for the application type "store" to something more realistic resulting in a block time of at most a few seconds.

wsodsong · 2024-02-20T09:28:28Z

We have reduced tx per block to 5000. Everything runs fine now. Should I close this ticket?

HerbertJordan · 2024-02-21T06:23:44Z

Yes, let's close this issue. The problem of a limited working-set size is also covered by #686.

wsodsong added the bug Something isn't working label Feb 16, 2024

HerbertJordan closed this as completed Feb 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aida's tx-generator test ended with panic #763

Aida's tx-generator test ended with panic #763

wsodsong commented Feb 16, 2024

HerbertJordan commented Feb 16, 2024

wsodsong commented Feb 16, 2024

HerbertJordan commented Feb 16, 2024

wsodsong commented Feb 20, 2024

HerbertJordan commented Feb 21, 2024

Aida's tx-generator test ended with panic #763

Aida's tx-generator test ended with panic #763

Comments

wsodsong commented Feb 16, 2024

HerbertJordan commented Feb 16, 2024

wsodsong commented Feb 16, 2024

HerbertJordan commented Feb 16, 2024

wsodsong commented Feb 20, 2024

HerbertJordan commented Feb 21, 2024