-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
channeldb: flatten the htlc attempts bucket #5635
channeldb: flatten the htlc attempts bucket #5635
Conversation
0320066
to
e2ace14
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly I had thought about this too. I think the current approach will potentially increase the data size of each db tx. If we have many attempts inside a payment, will that be an issue?
Previously I thought we could add to state to the payment so that we wouldn't need to fetch the htlc attempts in every query. All the attempts info can be abstracted using a state. If the payment is in a pending state, we need the attempt info. Otherwise, we won't bother fetching them. WDYT?
channeldb/payments.go
Outdated
@@ -312,6 +320,7 @@ func fetchPayment(bucket kvdb.RBucket) (*MPPayment, error) { | |||
|
|||
var htlcs []HTLCAttempt | |||
htlcsBucket := bucket.NestedReadBucket(paymentHtlcsBucket) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extra new line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// Get the bucket which contains the payment, fail if the key | ||
// does not have a bucket. | ||
bucket := paymentsBucket.NestedReadWriteBucket(k) | ||
if bucket == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the paymentsBucket
be empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could but then ForEach
will not iterate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I wanted to ask was there the above bucket
could be nil or not...If it could be it seems we shouldn't return an error below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry misunderstood what you meant. The semantics of NestedRead(Write)Bucket
are a bit tricky since it never returns an error. When it returns nil
it means that the bucket doesn't exists. In this case the payments-root-bucket
can either be empty (no payments yet) or only contains buckets which we ensure with this check.
channeldb/migration23/migration.go
Outdated
} | ||
|
||
// Delete the attempt bucket. | ||
return htlcsBucket.DeleteNestedBucket(aid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be safe, I think we need to do it in three steps,
- get the old keys
- put the new keys
- delete the old keys iff the second step succeeded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all happens in one huge write transaction. Therefore the order here doesn't play a huge role, if anything goes wrong, the whole transaction is rolled back.
But that is actually the main concern. Because we need every migration to be capable of running atomically, we'll end up migrating all payments in one transaction. For some nodes that might be a huge chunk of data.
Not sure what we can realistically do about it here...
I only have two ideas:
- Just go with it and hope it will be fine for all users. If anyone runs into a problem, we can advise them to delete failed payments first (we might need [WIP] Batched payment deletion #5368 and a CLI tool for using that RPC as well) for that.
- Do the migration in four steps: one for each key (info/settle/fail), where we just copy the data and then a last step that deletes the attempt ID buckets. That way the individual transactions would still be atomic (meaning either all payments are copied or none are) but lnd will only start once all 4 steps were run. But if let's say step 3 fails, you're in a bad state as well, so this sounds like a bad idea...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay PTAL, I think we should be safe from over allocating ourselves. The only issue may be that bbolt
is unable to handle such large txn. Also due to multiple bugs in bbolt
cursors we can't safely Delete
[1] or Put
[2] inside the cursor as it turns out, so I rewrote the migratio to iterate in Go instead.
[1] boltdb/bolt#357
[2] boltdb/bolt#268
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all happens in one huge write transaction.
Yeah I was thinking we should do it in separate transactions, at least for the deletion.
Another idea is to limit how many records we do at once. Maybe we could update N payments each time, and perform this update several times?
channeldb/migration23/migration.go
Outdated
// Collect attempt/settle/fail infos. | ||
attemptInfo := attemptBucket.Get(oldAttemptInfoKey) | ||
if attemptInfo != nil { | ||
key := string(htlcBucketKey(htlcAttemptInfoKey, aid)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we convert it into string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Migration code is now very differrent, so we avoid this conversion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great PR, very easy to follow the diff!
Added a comment about the DB transaction size, but not sure what we can realistically do about it. Knowing how many round trips this will save us, I think we do want this change to be implemented.
channeldb/migration23/migration.go
Outdated
} | ||
|
||
settleInfo := attemptBucket.Get(oldSettleInfoKey) | ||
if settleInfo != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to care about nil
vs. empty slice here, depending on the backend used? Maybe use len() > 0
instead? They should never be empty if they exist, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They should never be empty, but I agree len() > 0
is safer. Done.
channeldb/migration23/migration.go
Outdated
} | ||
|
||
// Delete the attempt bucket. | ||
return htlcsBucket.DeleteNestedBucket(aid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all happens in one huge write transaction. Therefore the order here doesn't play a huge role, if anything goes wrong, the whole transaction is rolled back.
But that is actually the main concern. Because we need every migration to be capable of running atomically, we'll end up migrating all payments in one transaction. For some nodes that might be a huge chunk of data.
Not sure what we can realistically do about it here...
I only have two ideas:
- Just go with it and hope it will be fine for all users. If anyone runs into a problem, we can advise them to delete failed payments first (we might need [WIP] Batched payment deletion #5368 and a CLI tool for using that RPC as well) for that.
- Do the migration in four steps: one for each key (info/settle/fail), where we just copy the data and then a last step that deletes the attempt ID buckets. That way the individual transactions would still be atomic (meaning either all payments are copied or none are) but lnd will only start once all 4 steps were run. But if let's say step 3 fails, you're in a bad state as well, so this sounds like a bad idea...
@@ -127,6 +127,10 @@ you. | |||
to make it less likely that we retry etcd transactions and make the commit | |||
queue more scalable. | |||
|
|||
* [Flatten the payment-htlcs-bucket](https://github.com/lightningnetwork/lnd/pull/5635) | |||
in order to make it possible to prefetch all htlc attempts of a payment in one | |||
DB operation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on the decision we make concerning the potentially large DB transaction, we might need to add some more information here what the user can do if they run into problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ptal
538ff09
to
6fd0cab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reviews @guggero and @yyforyongyu. Changed migration logic to be more memory friendly and added more tests to cover all cases. PTAL.
channeldb/payments.go
Outdated
@@ -312,6 +320,7 @@ func fetchPayment(bucket kvdb.RBucket) (*MPPayment, error) { | |||
|
|||
var htlcs []HTLCAttempt | |||
htlcsBucket := bucket.NestedReadBucket(paymentHtlcsBucket) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// Get the bucket which contains the payment, fail if the key | ||
// does not have a bucket. | ||
bucket := paymentsBucket.NestedReadWriteBucket(k) | ||
if bucket == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could but then ForEach
will not iterate.
channeldb/migration23/migration.go
Outdated
// Collect attempt/settle/fail infos. | ||
attemptInfo := attemptBucket.Get(oldAttemptInfoKey) | ||
if attemptInfo != nil { | ||
key := string(htlcBucketKey(htlcAttemptInfoKey, aid)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Migration code is now very differrent, so we avoid this conversion.
channeldb/migration23/migration.go
Outdated
} | ||
|
||
settleInfo := attemptBucket.Get(oldSettleInfoKey) | ||
if settleInfo != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They should never be empty, but I agree len() > 0
is safer. Done.
channeldb/migration23/migration.go
Outdated
} | ||
|
||
// Delete the attempt bucket. | ||
return htlcsBucket.DeleteNestedBucket(aid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay PTAL, I think we should be safe from over allocating ourselves. The only issue may be that bbolt
is unable to handle such large txn. Also due to multiple bugs in bbolt
cursors we can't safely Delete
[1] or Put
[2] inside the cursor as it turns out, so I rewrote the migratio to iterate in Go instead.
[1] boltdb/bolt#357
[2] boltdb/bolt#268
@@ -127,6 +127,10 @@ you. | |||
to make it less likely that we retry etcd transactions and make the commit | |||
queue more scalable. | |||
|
|||
* [Flatten the payment-htlcs-bucket](https://github.com/lightningnetwork/lnd/pull/5635) | |||
in order to make it possible to prefetch all htlc attempts of a payment in one | |||
DB operation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ptal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🎉
Tested the migration with 790k payments and it took about 40 seconds on my machine which isn't too bad at all!
htlcsMap[aid].Failure, err = readHtlcFailInfo(v) | ||
if err != nil { | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if there should be a default case that returns an error? Since there shouldn't be any other keys in this bucket anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes indeed, that's a good point. Done
func fetchHtlcAttemptInfo(bucket kvdb.RBucket) (*HTLCAttemptInfo, error) { | ||
b := bucket.Get(htlcAttemptInfoKey) | ||
if b == nil { | ||
return nil, errNoAttemptInfo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're slightly changing the logic here. Now we won't fail anymore if there isn't an attempt info key for a payment. But I guess that's probably okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right. Added back a santiy check to ensure that all attempts have an attempt info.
channeldb/migration23/migration.go
Outdated
if err := payments.ForEach(func(hash, v []byte) error { | ||
// Get the bucket which contains the payment, fail if the key | ||
// does not have a bucket. | ||
bucket := payments.NestedReadWriteBucket(hash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could use a read bucket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
// Collect all payment hashes so we can migrate payments one-by-one to | ||
// avoid any bugs bbolt might have when invalidating cursors. | ||
// For 100 million payments, this would need about 3 GiB memory so we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How many attempts are there for those 100 million payments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand there's no hard upper bound, but certainly limited by the graph itself so we can say it's constant per payment? Also since we migrate per payment we won't pre-allocate for each attempt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gocha. I asked because I was curious about the 3 GiB memory usage.
// Get the bucket which contains the payment, fail if the key | ||
// does not have a bucket. | ||
bucket := paymentsBucket.NestedReadWriteBucket(k) | ||
if bucket == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I wanted to ask was there the above bucket
could be nil or not...If it could be it seems we shouldn't return an error below.
channeldb/migration23/migration.go
Outdated
} | ||
|
||
// Delete the attempt bucket. | ||
return htlcsBucket.DeleteNestedBucket(aid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all happens in one huge write transaction.
Yeah I was thinking we should do it in separate transactions, at least for the deletion.
Another idea is to limit how many records we do at once. Maybe we could update N payments each time, and perform this update several times?
What happens when the program exits during those 40 seconds? Will the migration continue again after restart? |
Since it's one large DB transaction (which is what makes this tricky memory wise for both bolt and remote DBs), it's only committed in the end. So if the process exits, the DB is rolled back and the full migration retries on next startup. |
6fd0cab
to
f023631
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM👍
More thoughts on this topic. I think the htlc attempts are only needed when the payment is not in a terminal state. Once the payment is done(fail/succeed), the extra bandwidth used to carry htlc attempts is unnecessary. And since the payment is usually completed in a short time(60 seconds default timeout iirc), most of the time, those htlc attempts could be avoided. It would be nice if we could somehow specify what to be queried in that one big db transaction, kinda like constructing SQL queries beforehand like the old-fashioned way. Of course this is some future work/discussion.
|
||
// Collect all payment hashes so we can migrate payments one-by-one to | ||
// avoid any bugs bbolt might have when invalidating cursors. | ||
// For 100 million payments, this would need about 3 GiB memory so we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gocha. I asked because I was curious about the 3 GiB memory usage.
hash3Str = "99eb3f0a48f954e495d0c14ac63df04af8cefa59dafdbcd3d5046d1f564784d1" | ||
hash3 = hexStr(hash3Str) | ||
|
||
// failing1 will fail because all payment hashes should point to sub |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice docs!
f023631
to
f2cc783
Compare
Visit https://dashboard.github.orijtech.com?back=0&pr=5635&remote=true&repo=bhandras%2Flnd to see benchmark details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Just some small nits that can be cleaned up, but can be tacked onto another of the dependent PRs, want to keep this PR train moving along 🚄
// failure information, if any. | ||
htlcFailInfoKey = []byte("htlc-fail-info") | ||
// htlcFailInfoKey is the key used as the prefix of an HTLC attempt | ||
// failure information, if any.The HTLC attempt ID is concatenated at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing space after the prior here, also some extra spaces after The
.
htlcBucket := bucket.NestedReadBucket(k) | ||
attemptInfoCount := 0 | ||
err := bucket.ForEach(func(k, v []byte) error { | ||
aid := byteOrder.Uint64(k[len(k)-8:]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8 seems to the the attempt ID itself, could be lifted to a constant somewhere above.
Congrats @bhandras and well done! Indeed, to chime in your report of a 25% increase, this change showed all good improvements in the benchmarks per https://dashboard.github.orijtech.com/benchmark/8dcabf559b0b435c9a05c203224c9fdd /cc @cuonglm @kirbyquerby @willpoint Thanks @Roasbeef for adopting and using bencher :-) We plan on introducing better UX to improve the workflow and results |
This PR is a split from #5392 where in order to be able to fetch htlc attempts of a payment in one backend transaction, we flatten the
payment-htlcs-bucket
such that it won't include a sub bucket for each individual attempt, but instead we will postfix attempt info keys with the attempt ID. Furthermore we shorten the key prefixes for convenience.Why this flattening is important?
When using a remote DB backend (etcd or Postgres) fetching a payment can be especially heavy since each DB operation will result in a round-trip to the DB. By flattening the htlc's bucket where each attempt uses a global ID we may be able to fetch the bucket in one go. An implementation of such strategy can be found here: #5640
The speedup is roughly 25% when using the
async-payments-benchmark
integration test and helps with achiving stable performance using the bottlepay benchmark suite.