-
Notifications
You must be signed in to change notification settings - Fork 598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chain: wait for bitcoind rpc to warm up #677
base: master
Are you sure you want to change the base?
Conversation
chain/bitcoind_conn.go
Outdated
} else if err.(*btcjson.RPCError).Code == -28 { | ||
// Error code -28 indicates the bitcoind rpc is warming up. | ||
// We will wait and recheck periodically until rpc is ready. | ||
time.Sleep(time.Second * 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks okay, the only thing I'm worried about is bitcoind hanging, always returning this code. In that case it looks like we'll deadlock here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternative would be to crash, which is what happens now. This approach looks good to me, maybe just add a log like Waiting for bitcoind RPC ready signal
that's only logged once if we ever run into this error code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There could perhaps be some upper limit of retries before returning an error. A couple of minutes perhaps? But yeah the alternative is crashing, and if ever bitcoind is stuck in this rpc warming up status, something is seriously wrong and requires intervention. I can't say I've ever seen bitcoind get stuck in this point, even if there's something wrong like db corruption it'll eventually crash in my experience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sangaman in your experience, what has been the longest bitcoind has been in this warmup state? Waiting something like 5 mins sounds good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my experience, even on a low-powered ARM machine this warm-up stage takes about a minute or two by default, rarely much more than than that. However, it's also possible to change the config to increase the amount of checks that are done before starting the bitcoind server, theoretically this could take hours if you had it do very deep/thorough checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, I'm fine with blocking indefinitely as long as a log statement exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, isn't bitcoind in that warmup state for the entire initial blockchain sync (which can take days, or more)?
I think it should wait indefinitely. However the sleep could maybe be increased slowly to a max of 10 minutes or so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I don't think so, the warm up state is just while it's verifying the db and blocks on disk. Once it starts accepting and validating new blocks, it's past the warm up state and will respond to rpc requests normally.
What's needed to unblock this? |
Add a log, then I think it's good. ping @sangaman |
Just saw this ping, I'll add a log statement and review the feedback. Sorry this fell off my radar. |
I updated this PR to use the newly available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🌡️
chain/bitcoind_conn.go
Outdated
if err == nil { | ||
// No error means bitcoind is responsive and we can proceed. | ||
break | ||
} else if err.(*btcjson.RPCError).Code == btcjson.ErrRPCInWarmup { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should do a soft type assertion here to prevent a panic if err
is not of the type *btcjson.RPCError
:
} else if rpcErr, ok := err.(*btcjson.RPCError); ok && rpcErr.Code == btcjson.ErrRPCInWarmup {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, makes sense. Done.
cc @Roasbeef |
Should I continue with this PR? I do think it's a useful feature. It looks like I'll need to resolve some conflicts with pruned node logic. |
chain/bitcoind_conn.go
Outdated
var ( | ||
net wire.BitcoinNet | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this can be collapsed into a single line var net wire.BitcoinNet
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, a leftover from when this also declared err
which didn't exactly work any more since it's being used above.
chain/bitcoind_conn.go
Outdated
if err == nil { | ||
// No error means bitcoind is responsive and we can proceed. | ||
break | ||
} else if rpcErr, ok := err.(*btcjson.RPCError); ok && rpcErr.Code == btcjson.ErrRPCInWarmup { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: line exceeds 80 character limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I split this into two lines, for instance by moving everything after the semicolon to its own line, then make lint
doesn't pass and tells me to run gofmt
, but if I do that it moves this else if
statement back to one line. So I'm not sure what to do, it feels like I'm fighting against the style rules defined by the project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the else if
can just become an if
since the clause above breaks out. We can then split the line at &&
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for that bit of hand holding, that works out.
This modifies the `Start` method to handle the `RPC_IN_WARMUP` error code of `-28` by retrying the rpc call to determine the current network every second until it either succeeds or fails with a different code. The current behavior fails and terminates the connection upon receiving this error code. This change allows for connecting to a recently started bitcoind node and starting the client while bitcoind is still warming up. Related issues: lightningnetwork/lnd#1533 & https://github.com/ExchangeUnion/xud-docker/issues/195
This modifies the
Start
method to handle theRPC_IN_WARMUP
error code of-28
by retrying the rpc call to determine the current network every second until it either succeeds or fails with a different code. The current behavior fails and terminates the connection upon receiving this error code. This change allows for connecting to a recently started bitcoind node and starting the client while bitcoind is still warming up.Related issues: lightningnetwork/lnd#1533 & https://github.com/ExchangeUnion/xud-docker/issues/195
I first tried to approach this issue with changes to the lnd codebase, but the relevant code is what's being modified here. Attempting to call the
Start()
method from lnd a second time, for instance, doesn't work because of the up front check that determines whetherStart()
has already been called and immediately returns if so.I've tested this locally by pointing btcwallet in go.mod to my local repository and unlocking lnd moments after starting bitcoind. Instead of shutting down, lnd will keep trying bitcoind until it gets a non-error response and then continue with its initialization process. This would save some grief when bringing a node online and starting bitcoind & lnd at roughly the same time.