-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reapply 8644 on 9260 #9313
Reapply 8644 on 9260 #9313
Conversation
Important Review skippedAuto reviews are limited to specific labels. 🏷️ Labels to auto review (1)
Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
81e4188
to
9452cf8
Compare
Looks like a postgres itest failed. Looking into it. Doesn't appear to be related to the btcwallet deadlock fixed recently, so that's good! |
It looks like the shutdown came too early and the sweeper in This seems to happen rarely enough that I can't seem to reproduce it locally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good! I think we just need some comments and perhaps some direct coverage for the batch
package, then it's good to go!
// sqldb retry and still re-execute the | ||
// failing request individually. | ||
dbErr := sqldb.MapSQLError(err) | ||
if !sqldb.IsSerializationError(dbErr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It'd be nice to cover batch with a simple unit test to make sure the serialization errors are correctly handled and we don't regress later.
Makefile
Outdated
# each can run concurrently. Note that many of the settings here are | ||
# specifically for integration testing and are not fit for running | ||
# production nodes. | ||
docker run --name lnd-postgres -e POSTGRES_PASSWORD=postgres -p 6432:5432 -d postgres:13-alpine -N 1500 -c max_pred_locks_per_transaction=1024 -c max_locks_per_transaction=128 -c jit=off -c work_mem=8MB -c checkpoint_timeout=10min -c enable_seqscan=off |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does settings work_mem=8MB
and jit=off
add to the test case stability? If yes could you please add some comments why these were changed (along with other params)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got those settings from @djkazic's suggestions. I think it's likely they're unnecessary for the itests because the databases they're working with are pretty small. Will try running without them to see how it goes, and add comments for the rest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, for context those are the settings I'm using in my production postgres backed LND. DB size is ~5GB. jit=off
should help in all scenarios as JIT'ing mostly benefits long-running queries whereas postgres_kvdb tends to spawn many small and fast-executing queries. work_mem=8MB
on the other hand benefits larger DBs as it gives postgres more breathing room for each individual query to do sorts etc.
9452cf8
to
a94aa1c
Compare
Updated to address comments, add release notes, and do a CI run. I'm still running this locally to see if the new DB settings give me any trouble. |
Got a postgres unit test failure. Looking into it... Error is here. Looks similar to errors (chain notification-related) that happened on both postgres and sqlite unit test runs on #9260, which doesn't have parallel DB transactions re-enabled. I believe this is actually related to a flake in chainntnfs where a certain series of events makes the notifier attempt to send one more event than the (unread) channel has room for. The easiest fix is to add 2 to the channel allocation here, which has eliminated similar errors from all of our (Lightspark) CI runs in our private fork. This is in lieu of actually tracking down the flake further and fixing it "properly." I'll push it as soon as this CI run is complete. The updated DB settings seem to be working OK locally on my machine as well. |
Thanks a lot for digging into the unit test flake! That certainly was an easy way to fix and was originally used in #9258. Then I realized there's another place we need the dedup check which was fixed in 4632044. Since this PR is based on #9260 and that PR has this commit in its upstream, I'd expect it to be fixed properly there but it seems not? The other related fix is #9309 which fixed another flake. Moving forward, I think we can focus on getting the postgres-related merged, then open new PRs to fix the unit test flakes. |
I'm off until Monday but will remove the latest commit then. Thanks! |
a94aa1c
to
fb50cec
Compare
I tried to take off the latest commit and force-push but GitHub got stuck on |
The only test that failed looks irrelevant to the DB (protofsm unit-race), woohoo! We did get some coverage reductions, but the workflow isn't tracking coverage with itests that use the postgres backend. Maybe we should turn it on? |
5811026
to
a0cefe4
Compare
Could you do a rebase on |
Yep, I'll rebase this one, and then backport #9242/rebase it on current master. Thanks! |
For Windows the tests run much slower so we create customized timeouts for them.
This commit removes the panic used in checking the shutdown log. Instead, the error is returned and asserted in `shutdownAllNodes` so it's easier to check which node failed in which test. We also catch all the errors returned from `StopDaemon` call to properly access the shutdown behavior.
Keep the SQL, etcd, bitcoin rpcpolling builds and non-ubuntu builds at 8 since they are less stable.
We sometimes see `timeout waiting for UTXOs` error from bitcoind-related itests due to the chain backend not synced to the miner. We now assert it's synced before continue.
The response from `ClosedChannels` may not be up-to-date, so we wrap it inside a wait closure.
df5d749
to
cc89b32
Compare
This reverts commit 67419a7.
To make this itest work reliably with multiple parallel SQL transactions, we need to count both the settle and final HTLC events. Otherwise, sometimes the final events from earlier forwards are counted before the forward events from later forwards, causing a miscount of the settle events. If we expect both the settle and final event for each forward, we don't miscount.
c2fa057
to
175711a
Compare
Rebased and added logging for |
48485c7
to
9d5e14b
Compare
hmm think it's closed once the upstream branch was deleted...since we are merging #9242 so think it's fine to close it. |
Rebase of #9242 on #9260. Now includes btcsuite/btcwallet#967.