Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aida's tx-generator test ended with panic #763

Closed
wsodsong opened this issue Feb 16, 2024 · 5 comments
Closed

Aida's tx-generator test ended with panic #763

wsodsong opened this issue Feb 16, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@wsodsong
Copy link
Collaborator

An Aida's tx-generator test ended with panic: unable to store branch node with dirty hash error. Jenkins.

In this test we run store tx type only with 50,000 tx per block for 100 blocks using London fork. Command (develop branch):

build/aida-vm-sdb tx-generator --db-impl carmen --db-variant go-file --carmen-schema 5 --tx-type store --block-length 50000 london london+100

Error message:

 panic: unable to store branch node with dirty hash
 
 goroutine 134 [running]:
 github.com/Fantom-foundation/Carmen/go/state/mpt.BranchNodeEncoderWithChildHashes.Store({}, {0xc0001a3680?, 0x262?, 0x1809be0?}, 0x425b01?)
 	/home/jenkins/workspace/Aida/ReleaseTesting/FunctionalTests/F06/carmen/go/state/mpt/nodes.go:2064 +0x209
 github.com/Fantom-foundation/Carmen/go/backend/stock/file.(*fileStock[...]).Set(_, _, {{{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...}, ...}, ...})
 	/home/jenkins/workspace/Aida/ReleaseTesting/FunctionalTests/F06/carmen/go/backend/stock/file/file.go:245 +0xca
 github.com/Fantom-foundation/Carmen/go/backend/stock/synced.(*syncedStock[...]).Set(_, _, {{{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...}, ...}, ...})
 	/home/jenkins/workspace/Aida/ReleaseTesting/FunctionalTests/F06/carmen/go/backend/stock/synced/synced.go:39 +0xe2
 github.com/Fantom-foundation/Carmen/go/state/mpt.(*Forest).flushNode(0x0?, 0x42a464?, {0x1e77390?, 0xc0c65a3000?})
 	/home/jenkins/workspace/Aida/ReleaseTesting/FunctionalTests/F06/carmen/go/state/mpt/forest.go:784 +0x1e5
 github.com/Fantom-foundation/Carmen/go/state/mpt.writeBufferSink.Write({0x18ad000?}, 0xc0002aa090?, {{0xc0c65a4040}})
 	/home/jenkins/workspace/Aida/ReleaseTesting/FunctionalTests/F06/carmen/go/state/mpt/forest.go:921 +0x25
 github.com/Fantom-foundation/Carmen/go/state/mpt.(*writeBuffer).emptyBuffer(0xc0001f0200)
 	/home/jenkins/workspace/Aida/ReleaseTesting/FunctionalTests/F06/carmen/go/state/mpt/write_buffer.go:198 +0x354
 github.com/Fantom-foundation/Carmen/go/state/mpt.makeWriteBuffer.func1()
 	/home/jenkins/workspace/Aida/ReleaseTesting/FunctionalTests/F06/carmen/go/state/mpt/write_buffer.go:94 +0xe5
 created by github.com/Fantom-foundation/Carmen/go/state/mpt.makeWriteBuffer in goroutine 1
 	/home/jenkins/workspace/Aida/ReleaseTesting/FunctionalTests/F06/carmen/go/state/mpt/write_buffer.go:88 +0x1ab
@wsodsong wsodsong added the bug Something isn't working label Feb 16, 2024
@HerbertJordan
Copy link
Collaborator

It looks like the test scenario that failed on Jenkins was running for 14+ hours for processing a single block (37534834) and got killed before completing it.

I expect what has happen in this case was that the entire 14h of execution all modifications have just been made based on the data cached in the StateDB instance (the component handling transaction contexts and sitting between the EVM and the actual database). When finally the block was completed and to be committed, all the data in the StateDB cache was pushed into the DB, exceeding internal buffer limits for working-set sizes for a single commit.

Right now, there is unfortunately an upper limit of the working set size defined by the capacity of the LiveDB node cache size. If this limit is exceeded, the program is crashing with a dirty-hash panic.

The default working set is enough for at least ~500k modifications within a single block. The store application of Norma, however, causes roughly 260 updates per transaction. Thus, the 50.000 Tx per block would produce up to 1.3 million updates, exceeding the per-block limit which has likely triggered the issue.

Things we should do:

  • @wsodsong can you confirm that the block progress report of aida-vm-sdb is correct when using the tx-generator mode and that it was indeed just processing a single block?
  • reduce parameter configurations to something within the limits of ~500k updates per block
  • investigate the possibility of eliminating this maximum working set size constraint

@wsodsong
Copy link
Collaborator Author

@HerbertJordan I can confirm that the log is correct. One block has 50,000 transactions in this test. I made a quick calculation from the processing rates reported. It seems that the panic happens right at the end of the first block.

@HerbertJordan
Copy link
Collaborator

@wsodsong thanks for checking. In this case I would suggest to reduce the block size for the application type "store" to something more realistic resulting in a block time of at most a few seconds.

@wsodsong
Copy link
Collaborator Author

We have reduced tx per block to 5000. Everything runs fine now. Should I close this ticket?

@HerbertJordan
Copy link
Collaborator

Yes, let's close this issue. The problem of a limited working-set size is also covered by #686.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants