Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

channeldb: flatten the htlc attempts bucket #5635

Merged
merged 3 commits into from
Sep 8, 2021

Conversation

bhandras
Copy link
Collaborator

@bhandras bhandras commented Aug 17, 2021

This PR is a split from #5392 where in order to be able to fetch htlc attempts of a payment in one backend transaction, we flatten the payment-htlcs-bucket such that it won't include a sub bucket for each individual attempt, but instead we will postfix attempt info keys with the attempt ID. Furthermore we shorten the key prefixes for convenience.

Why this flattening is important?

When using a remote DB backend (etcd or Postgres) fetching a payment can be especially heavy since each DB operation will result in a round-trip to the DB. By flattening the htlc's bucket where each attempt uses a global ID we may be able to fetch the bucket in one go. An implementation of such strategy can be found here: #5640

The speedup is roughly 25% when using the async-payments-benchmark integration test and helps with achiving stable performance using the bottlepay benchmark suite.

Copy link
Member

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly I had thought about this too. I think the current approach will potentially increase the data size of each db tx. If we have many attempts inside a payment, will that be an issue?
Previously I thought we could add to state to the payment so that we wouldn't need to fetch the htlc attempts in every query. All the attempts info can be abstracted using a state. If the payment is in a pending state, we need the attempt info. Otherwise, we won't bother fetching them. WDYT?

channeldb/payments.go Show resolved Hide resolved
@@ -312,6 +320,7 @@ func fetchPayment(bucket kvdb.RBucket) (*MPPayment, error) {

var htlcs []HTLCAttempt
htlcsBucket := bucket.NestedReadBucket(paymentHtlcsBucket)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extra new line

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

channeldb/payments.go Show resolved Hide resolved
channeldb/payments.go Show resolved Hide resolved
// Get the bucket which contains the payment, fail if the key
// does not have a bucket.
bucket := paymentsBucket.NestedReadWriteBucket(k)
if bucket == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the paymentsBucket be empty?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could but then ForEach will not iterate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I wanted to ask was there the above bucket could be nil or not...If it could be it seems we shouldn't return an error below.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry misunderstood what you meant. The semantics of NestedRead(Write)Bucket are a bit tricky since it never returns an error. When it returns nil it means that the bucket doesn't exists. In this case the payments-root-bucket can either be empty (no payments yet) or only contains buckets which we ensure with this check.

}

// Delete the attempt bucket.
return htlcsBucket.DeleteNestedBucket(aid)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be safe, I think we need to do it in three steps,

  • get the old keys
  • put the new keys
  • delete the old keys iff the second step succeeded

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all happens in one huge write transaction. Therefore the order here doesn't play a huge role, if anything goes wrong, the whole transaction is rolled back.
But that is actually the main concern. Because we need every migration to be capable of running atomically, we'll end up migrating all payments in one transaction. For some nodes that might be a huge chunk of data.

Not sure what we can realistically do about it here...
I only have two ideas:

  1. Just go with it and hope it will be fine for all users. If anyone runs into a problem, we can advise them to delete failed payments first (we might need [WIP] Batched payment deletion #5368 and a CLI tool for using that RPC as well) for that.
  2. Do the migration in four steps: one for each key (info/settle/fail), where we just copy the data and then a last step that deletes the attempt ID buckets. That way the individual transactions would still be atomic (meaning either all payments are copied or none are) but lnd will only start once all 4 steps were run. But if let's say step 3 fails, you're in a bad state as well, so this sounds like a bad idea...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay PTAL, I think we should be safe from over allocating ourselves. The only issue may be that bbolt is unable to handle such large txn. Also due to multiple bugs in bbolt cursors we can't safely Delete [1] or Put [2] inside the cursor as it turns out, so I rewrote the migratio to iterate in Go instead.

[1] boltdb/bolt#357
[2] boltdb/bolt#268

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all happens in one huge write transaction.
Yeah I was thinking we should do it in separate transactions, at least for the deletion.

Another idea is to limit how many records we do at once. Maybe we could update N payments each time, and perform this update several times?

// Collect attempt/settle/fail infos.
attemptInfo := attemptBucket.Get(oldAttemptInfoKey)
if attemptInfo != nil {
key := string(htlcBucketKey(htlcAttemptInfoKey, aid))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we convert it into string?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migration code is now very differrent, so we avoid this conversion.

channeldb/migration23/migration_test.go Show resolved Hide resolved
Copy link
Collaborator

@guggero guggero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR, very easy to follow the diff!

Added a comment about the DB transaction size, but not sure what we can realistically do about it. Knowing how many round trips this will save us, I think we do want this change to be implemented.

channeldb/payments.go Show resolved Hide resolved
}

settleInfo := attemptBucket.Get(oldSettleInfoKey)
if settleInfo != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to care about nil vs. empty slice here, depending on the backend used? Maybe use len() > 0 instead? They should never be empty if they exist, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They should never be empty, but I agree len() > 0 is safer. Done.

}

// Delete the attempt bucket.
return htlcsBucket.DeleteNestedBucket(aid)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all happens in one huge write transaction. Therefore the order here doesn't play a huge role, if anything goes wrong, the whole transaction is rolled back.
But that is actually the main concern. Because we need every migration to be capable of running atomically, we'll end up migrating all payments in one transaction. For some nodes that might be a huge chunk of data.

Not sure what we can realistically do about it here...
I only have two ideas:

  1. Just go with it and hope it will be fine for all users. If anyone runs into a problem, we can advise them to delete failed payments first (we might need [WIP] Batched payment deletion #5368 and a CLI tool for using that RPC as well) for that.
  2. Do the migration in four steps: one for each key (info/settle/fail), where we just copy the data and then a last step that deletes the attempt ID buckets. That way the individual transactions would still be atomic (meaning either all payments are copied or none are) but lnd will only start once all 4 steps were run. But if let's say step 3 fails, you're in a bad state as well, so this sounds like a bad idea...

@@ -127,6 +127,10 @@ you.
to make it less likely that we retry etcd transactions and make the commit
queue more scalable.

* [Flatten the payment-htlcs-bucket](https://github.com/lightningnetwork/lnd/pull/5635)
in order to make it possible to prefetch all htlc attempts of a payment in one
DB operation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on the decision we make concerning the potentially large DB transaction, we might need to add some more information here what the user can do if they run into problems.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ptal

@bhandras bhandras force-pushed the flatten-htlc-bucket branch 2 times, most recently from 538ff09 to 6fd0cab Compare August 24, 2021 17:09
Copy link
Collaborator Author

@bhandras bhandras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reviews @guggero and @yyforyongyu. Changed migration logic to be more memory friendly and added more tests to cover all cases. PTAL.

@@ -312,6 +320,7 @@ func fetchPayment(bucket kvdb.RBucket) (*MPPayment, error) {

var htlcs []HTLCAttempt
htlcsBucket := bucket.NestedReadBucket(paymentHtlcsBucket)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

channeldb/payments.go Show resolved Hide resolved
channeldb/payments.go Show resolved Hide resolved
channeldb/payments.go Show resolved Hide resolved
// Get the bucket which contains the payment, fail if the key
// does not have a bucket.
bucket := paymentsBucket.NestedReadWriteBucket(k)
if bucket == nil {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could but then ForEach will not iterate.

// Collect attempt/settle/fail infos.
attemptInfo := attemptBucket.Get(oldAttemptInfoKey)
if attemptInfo != nil {
key := string(htlcBucketKey(htlcAttemptInfoKey, aid))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migration code is now very differrent, so we avoid this conversion.

}

settleInfo := attemptBucket.Get(oldSettleInfoKey)
if settleInfo != nil {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They should never be empty, but I agree len() > 0 is safer. Done.

}

// Delete the attempt bucket.
return htlcsBucket.DeleteNestedBucket(aid)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay PTAL, I think we should be safe from over allocating ourselves. The only issue may be that bbolt is unable to handle such large txn. Also due to multiple bugs in bbolt cursors we can't safely Delete [1] or Put [2] inside the cursor as it turns out, so I rewrote the migratio to iterate in Go instead.

[1] boltdb/bolt#357
[2] boltdb/bolt#268

channeldb/migration23/migration_test.go Show resolved Hide resolved
@@ -127,6 +127,10 @@ you.
to make it less likely that we retry etcd transactions and make the commit
queue more scalable.

* [Flatten the payment-htlcs-bucket](https://github.com/lightningnetwork/lnd/pull/5635)
in order to make it possible to prefetch all htlc attempts of a payment in one
DB operation.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ptal

Copy link
Collaborator

@guggero guggero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🎉

Tested the migration with 790k payments and it took about 40 seconds on my machine which isn't too bad at all!

htlcsMap[aid].Failure, err = readHtlcFailInfo(v)
if err != nil {
return err
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if there should be a default case that returns an error? Since there shouldn't be any other keys in this bucket anymore.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes indeed, that's a good point. Done

func fetchHtlcAttemptInfo(bucket kvdb.RBucket) (*HTLCAttemptInfo, error) {
b := bucket.Get(htlcAttemptInfoKey)
if b == nil {
return nil, errNoAttemptInfo
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're slightly changing the logic here. Now we won't fail anymore if there isn't an attempt info key for a payment. But I guess that's probably okay?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right. Added back a santiy check to ensure that all attempts have an attempt info.

if err := payments.ForEach(func(hash, v []byte) error {
// Get the bucket which contains the payment, fail if the key
// does not have a bucket.
bucket := payments.NestedReadWriteBucket(hash)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could use a read bucket.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


// Collect all payment hashes so we can migrate payments one-by-one to
// avoid any bugs bbolt might have when invalidating cursors.
// For 100 million payments, this would need about 3 GiB memory so we
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many attempts are there for those 100 million payments?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand there's no hard upper bound, but certainly limited by the graph itself so we can say it's constant per payment? Also since we migrate per payment we won't pre-allocate for each attempt.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gocha. I asked because I was curious about the 3 GiB memory usage.

// Get the bucket which contains the payment, fail if the key
// does not have a bucket.
bucket := paymentsBucket.NestedReadWriteBucket(k)
if bucket == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I wanted to ask was there the above bucket could be nil or not...If it could be it seems we shouldn't return an error below.

}

// Delete the attempt bucket.
return htlcsBucket.DeleteNestedBucket(aid)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all happens in one huge write transaction.
Yeah I was thinking we should do it in separate transactions, at least for the deletion.

Another idea is to limit how many records we do at once. Maybe we could update N payments each time, and perform this update several times?

@yyforyongyu
Copy link
Member

Tested the migration with 790k payments and it took about 40 seconds on my machine which isn't too bad at all!

What happens when the program exits during those 40 seconds? Will the migration continue again after restart?

@guggero
Copy link
Collaborator

guggero commented Aug 27, 2021

What happens when the program exits during those 40 seconds? Will the migration continue again after restart?

Since it's one large DB transaction (which is what makes this tricky memory wise for both bolt and remote DBs), it's only committed in the end. So if the process exits, the DB is rolled back and the full migration retries on next startup.

@bhandras bhandras force-pushed the flatten-htlc-bucket branch from 6fd0cab to f023631 Compare August 27, 2021 13:12
@bhandras bhandras requested a review from yyforyongyu August 27, 2021 13:14
Copy link
Member

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM👍

More thoughts on this topic. I think the htlc attempts are only needed when the payment is not in a terminal state. Once the payment is done(fail/succeed), the extra bandwidth used to carry htlc attempts is unnecessary. And since the payment is usually completed in a short time(60 seconds default timeout iirc), most of the time, those htlc attempts could be avoided. It would be nice if we could somehow specify what to be queried in that one big db transaction, kinda like constructing SQL queries beforehand like the old-fashioned way. Of course this is some future work/discussion.


// Collect all payment hashes so we can migrate payments one-by-one to
// avoid any bugs bbolt might have when invalidating cursors.
// For 100 million payments, this would need about 3 GiB memory so we
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gocha. I asked because I was curious about the 3 GiB memory usage.

hash3Str = "99eb3f0a48f954e495d0c14ac63df04af8cefa59dafdbcd3d5046d1f564784d1"
hash3 = hexStr(hash3Str)

// failing1 will fail because all payment hashes should point to sub
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice docs!

@bhandras bhandras force-pushed the flatten-htlc-bucket branch from f023631 to f2cc783 Compare September 7, 2021 08:47
@orijbot
Copy link

orijbot commented Sep 7, 2021

Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Just some small nits that can be cleaned up, but can be tacked onto another of the dependent PRs, want to keep this PR train moving along 🚄

// failure information, if any.
htlcFailInfoKey = []byte("htlc-fail-info")
// htlcFailInfoKey is the key used as the prefix of an HTLC attempt
// failure information, if any.The HTLC attempt ID is concatenated at
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space after the prior here, also some extra spaces after The.

htlcBucket := bucket.NestedReadBucket(k)
attemptInfoCount := 0
err := bucket.ForEach(func(k, v []byte) error {
aid := byteOrder.Uint64(k[len(k)-8:])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8 seems to the the attempt ID itself, could be lifted to a constant somewhere above.

@Roasbeef Roasbeef merged commit 75f5b40 into lightningnetwork:master Sep 8, 2021
@odeke-em
Copy link

odeke-em commented Sep 8, 2021

Congrats @bhandras and well done! Indeed, to chime in your report of a 25% increase, this change showed all good improvements in the benchmarks per https://dashboard.github.orijtech.com/benchmark/8dcabf559b0b435c9a05c203224c9fdd
Screen Shot 2021-09-07 at 11 57 00 PM

Screen Shot 2021-09-07 at 11 58 46 PM

/cc @cuonglm @kirbyquerby @willpoint

Thanks @Roasbeef for adopting and using bencher :-) We plan on introducing better UX to improve the workflow and results

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants