Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EFM] Fix bugs from Benchnet testing #6898

Open
wants to merge 11 commits into
base: feature/efm-recovery
Choose a base branch
from
4 changes: 3 additions & 1 deletion cmd/consensus/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -205,9 +205,11 @@ func main() {
PostInit(func(nodeConfig *cmd.NodeConfig) error {
// TODO(EFM, #6794): This function is introduced to implement a backward-compatible upgrade from v1 to v2.
// Remove this once we complete the network upgrade.
if err := operation.RetryOnConflict(nodeBuilder.DB.Update, operation.MigrateDKGEndStateFromV1()); err != nil {
log := nodeConfig.Logger.With().Str("postinit", "dkg_end_state_migration").Logger()
if err := operation.RetryOnConflict(nodeBuilder.SecretsDB.Update, operation.MigrateDKGEndStateFromV1(log)); err != nil {
return fmt.Errorf("could not migrate DKG end state from v1 to v2: %w", err)
}
log.Debug().Msgf("completed migrating DKG end state from v1 to v2")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about making it Info and moving it into the method itself and log only if we have actually migrated. Additionally we could have a debug msg which shows that we have entered and exited PostInit itself

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

return nil
}).
Module("machine account config", func(node *cmd.NodeConfig) error {
Expand Down
7 changes: 6 additions & 1 deletion engine/consensus/dkg/reactor_engine.go
Original file line number Diff line number Diff line change
Expand Up @@ -284,8 +284,13 @@ func (e *ReactorEngine) handleEpochCommittedPhaseStarted(currentEpochCounter uin

return
}
if currentState == flow.RandomBeaconKeyCommitted {
// this can happen if a healthy node which succeeded the DKG restarts in the EpochCommit phase
log.Debug().Msg("checking beacon key consistency after EpochCommit: observed committed beacon key for most recent dkg - exiting")
jordanschalm marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only for better traces? Functionality it doesn't change anything.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is for more accurate logs - there is no functional change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it does change something, because it shortcuts the sanity check that all (repeated) calls to committing the same epoch don't have inconsistent information. See my comment below for details.

return
}
if currentState != flow.DKGStateCompleted {
jordanschalm marked this conversation as resolved.
Show resolved Hide resolved
log.Warn().Msgf("checking beacon key consistency: exiting because dkg didn't reach completed state: %s", currentState.String())
log.Warn().Msgf("checking beacon key consistency after EpochCommit: exiting because dkg didn't reach completed state: %s", currentState.String())
return
}
snapshot := e.State.AtBlockID(firstBlock.ID())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to move this right before line 302

nextDKG, err := snapshot.Epochs().Next().DKG()

In my head, querying the snapshot belongs to the code block around line 302 much more than it belongs to the exit condition ... just personal preference - feel free to ignore.

Expand Down
4 changes: 3 additions & 1 deletion storage/badger/operation/dkg.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ import (
"fmt"

"github.com/dgraph-io/badger/v2"
"github.com/rs/zerolog"

"github.com/onflow/flow-go/model/encodable"
"github.com/onflow/flow-go/model/flow"
Expand Down Expand Up @@ -81,7 +82,7 @@ func RetrieveDKGStateForEpoch(epochCounter uint64, currentState *flow.DKGState)
// It reads already stored data by deprecated prefix and writes it to the new prefix with values converted to the new representation.
// TODO(EFM, #6794): This function is introduced to implement a backward-compatible upgrade from v1 to v2.
// Remove this once we complete the network upgrade.
func MigrateDKGEndStateFromV1() func(txn *badger.Txn) error {
func MigrateDKGEndStateFromV1(log zerolog.Logger) func(txn *badger.Txn) error {
return func(txn *badger.Txn) error {
var ops []func(*badger.Txn) error
err := traverse(makePrefix(codeDKGEndState), func() (checkFunc, createFunc, handleFunc) {
Expand Down Expand Up @@ -110,6 +111,7 @@ func MigrateDKGEndStateFromV1() func(txn *badger.Txn) error {
ops = append(ops,
UpsertDKGStateForEpoch(epochCounter, newState),
remove(makePrefix(codeDKGEndState, epochCounter)))
log.Debug().Msgf("removing %d->%d, writing %d->%d", epochCounter, oldState, epochCounter, newState)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log.Debug().Msgf("removing %d->%d, writing %d->%d", epochCounter, oldState, epochCounter, newState)
log.Info().Msgf("removing %d->%d, writing %d->%d", epochCounter, oldState, epochCounter, newState)

This doesn't happen often so maybe info is better to understand if we have migrated or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


return nil
}
Expand Down
5 changes: 3 additions & 2 deletions storage/badger/operation/dkg_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ import (
"testing"

"github.com/dgraph-io/badger/v2"
"github.com/rs/zerolog"
"github.com/stretchr/testify/assert"

"github.com/onflow/flow-go/model/encodable"
Expand Down Expand Up @@ -130,7 +131,7 @@ func TestMigrateDKGEndStateFromV1(t *testing.T) {
}

// migrate the state
err := db.Update(MigrateDKGEndStateFromV1())
err := db.Update(MigrateDKGEndStateFromV1(zerolog.Nop()))
jordanschalm marked this conversation as resolved.
Show resolved Hide resolved
assert.NoError(t, err)

assertMigrationSuccessful := func() {
Expand Down Expand Up @@ -167,7 +168,7 @@ func TestMigrateDKGEndStateFromV1(t *testing.T) {
assertMigrationSuccessful()

// migrating again should be no-op
err = db.Update(MigrateDKGEndStateFromV1())
err = db.Update(MigrateDKGEndStateFromV1(zerolog.Nop()))
jordanschalm marked this conversation as resolved.
Show resolved Hide resolved
assert.NoError(t, err)
assertMigrationSuccessful()
})
Expand Down
Loading