-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix Etchash cache at Mordor #499
fix Etchash cache at Mordor #499
Conversation
This test reproduces an error encountered on the Mordor testnet and reported here #439 Ethash returns a cache value with epoch length of 30k, where it should return one appropriate for the ECIP1099-era with an epoch length of 60k. This caused header verification to fail because the mix digest was incorrect. This issue only occurred while syncing the Mordor network because the cache was in-memory and because the original-era epoch was matched later by the ECIP1099-era epoch and the LRU future item was anachronistic. Date: 2022-08-30 08:35:32-07:00 Signed-off-by: meows <[email protected]>
The ECIP1099 transition causes the epoch numbers to halve, causing the lru.future value to be ahead of (greater than) the current epoch, making this conditional (quietly) fail. After a while (eg. syncing on Mordor), the lru.future value becomes spuriously available again because the epoch heights eventually match up again, but when they do, the lru.future value references the pre-ECIP1099 era. Date: 2022-08-30 08:41:28-07:00 Signed-off-by: meows <[email protected]>
This handles potential cache collisions due to ECIP1099. epoch/epochLength is sufficiently unique, where epoch alone could be duplicated by post-ECIP1099 epochs. Date: 2022-08-30 08:44:22-07:00 Signed-off-by: meows <[email protected]>
Prior to this patch, the code assumed that ECIP1099 would (must) be activated on an epoch transition block value, ie. a multiple of 30_000. This patch removes this assumption. Date: 2022-08-30 08:46:28-07:00 Signed-off-by: meows <[email protected]>
Prior to this patch, existing cache files could only be identified unidirectionally, by matching a seedhash generated for some epoch. This is problematic for ECIP1099 because epoch numbers are no longer continuous, so there were dangling cache files (since the iterator removing them had to use the contemporary cache epoch, which may have recently been halved). This patch allows cache files to be iterated and identified by the epoch they represent using scanned values from their filename. This avoids spurious cache file hits using only seed hash alone that could have been caused by dangling cache files. Legacy cache files (by matching legacy naming scheme) will be removed. Date: 2022-08-30 08:53:57-07:00 Signed-off-by: meows <[email protected]>
This function generates the dataset, not the cache. Date: 2022-08-30 08:55:11-07:00 Signed-off-by: meows <[email protected]>
Date: 2022-08-30 09:02:40-07:00 Signed-off-by: meows <[email protected]>
Date: 2022-08-30 09:03:00-07:00 Signed-off-by: meows <[email protected]>
This gets called by runtime.SetFinalizer within the cache.generate method. Date: 2022-08-31 06:59:59-07:00 Signed-off-by: meows <[email protected]>
https://github.com/etclabscore/core-geth/pull/499/files#r958753184 Date: 2022-08-31 07:03:29-07:00 Signed-off-by: meows <[email protected]>
With a generous activation (>=), 1099 can be activated but until the next epoch rolls around it won't actually be activated, eg. ecip1099Block=3_000_042 wont activate until 3_030_000... which is unpredictable. Better to conform to and enforce stricter expectations. Date: 2022-08-31 07:06:27-07:00 Signed-off-by: meows <[email protected]>
https://github.com/ethereumclassic/ECIPs/blob/master/_specs/ecip-1099.md#specification tells us that seed hashes wont overlap. This tests shows that indeed, for the range tested, they don't collide. Date: 2022-08-31 07:26:14-07:00 Signed-off-by: meows <[email protected]>
Fresh syncing with Mordor finished just fine. Once you make the small changes in comments, I will approve this PR. |
…sted Date: 2022-08-31 08:52:28-07:00 Signed-off-by: meows <[email protected]>
Review comments are addressed and resolved (links to associated commits in review comments). |
…rates as expected Date: 2022-08-31 08:53:17-07:00 Signed-off-by: meows <[email protected]>
0cb5927
to
836792f
Compare
This fix echoes the fix originally made here etclabscore/core-geth#499 Date: 2022-08-31 09:46:26-07:00 Signed-off-by: meows <[email protected]>
@ziogaschr PTAL 69f5300 This proposes keeping the file naming scheme identical between the cache and full dataset.
|
…s too Date: 2022-08-31 19:55:29-07:00 Signed-off-by: meows <[email protected]>
69f5300
to
310ca59
Compare
During the ECIP-1099 transition we handled the renaming of existing DAGs, from what I recall it has been done for smoother transition times, so as not all miners had to wait for DAG generation for the next block. API-breaking change: I don't think it is, my only worry is, if any GPU mining software, is using/sharing the same DAG files. I don't think so. Once the above checked, I vote on just removing all old files and recreating from scratch. |
Yea, the GPU miners I've used seem to generate their own DAGs (which makes sense because a user could configure I too am OK with regenerating and rming the old DAGs. Let's do it. KISS. |
Well that decision actually kinda breaks the implementation of |
@blackmennewstyle good to know that you use DAG files this way, we couldn't find any use-cases like this one. We will also discuss with @meowsbits, what would mean for us to provide the old naming. |
@blackmennewstyle can you please create a new issue and add a link to this one? |
@ziogaschr The naming we use is exactly the one which has been deleted in this commit: https://github.com/etclabscore/core-geth/pull/499/files#diff-81dd4f9d1a46a485d74affabbf00bcc61fd98a44bcf785a57d59d87ce50ddb93L432 |
@blackmennewstyle yes I understood this. Was just checking with you if you could have a different pattern (our new one) in your code under certain circumstances. |
@ziogaschr I believe we can since the C++ We probably need to find out the functions which are used to handle and generate the DAG file and adjust the naming scheme to the new one. While still allowing the |
Hey @blackmennewstyle. Thanks for your report, I'm getting caught up here.
I don't see any reference to geth's DAG cache files in there yet. And the actual source used by
Can you provide logs or some more further detail on what "kinda breaks" looks like? PS. Bummer that the change broke the dep, but IMO a cache file footprint should not be considered a stable API surface for mining pools... |
The break is not critical since |
So maybe |
No worry, I actually found the mechanism in our It does not seem to be an easy modification to support your new naming, it will require lot of code changes through multiple functions. |
Fixes #439 and #493.
Please read commit messages for more context, but here's an overview:
future item
in the in-memory lru cacheseed hashes
, which are hard to identify when epochs are not continuousThis PR (and branch) uses #496 as a base.
Here's Mordor passing 4_980_000 without issue:
Here's
--vmodule=consensus/*=6
logs during the same time context: