-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Last sweep tx is constantly rebroadcasted on startup #7708
Comments
@guggero this is different, the sweeper will always try to rebroadcast the last sweep transaction each time. It doesn't need to though, as the wallet will pick that up, and any other sweeps are assumed to be re-offered on start up. |
What we need to do is actually start to remove the last week txn from the sweeper's DB, and also remove it here if/when we get that error: Lines 343 to 365 in bbbf7d3
As is, it tries to make minimal assumptions about what will actually be re-broadcasted, but we have a rebroadcast assumption at a higher level now. |
We also never delete any of the past sweep txid's as well. This is safe to do as we only need them to dtect our own sweeps, but after we move a conflict, then we can remove them. |
Any update on this? We keep getting support cases from our (Breez) users that are unable to spend their funds. |
No it does not. I try give a short overview. What is important to point out, that this only is a problem for neutrino nodes. Normal nodes will remove this double spend tx immediately not blocking any funds. Basic problem is that somehow the Bitcoind Backend does not send any failure in case we are broadcasting a double spend when using the BIP157 I think. This is one happens to the bitcoind backend:
It will not report an error because it first stores it as orphan therefore not replying with a reject message to the lnd node. And only later not accepting it but this is never send to the client (the lnd node) did something change here in the bitcoind backend. I have the feeling that all transactions are accepted by the bitcoind backend now with the current design and nothing gets rejected at all? Solving this problem includes 2 fixes: We could just not republish the last transaction as referenced in this issue here: This would not completely fix the issue because part of the transactions are already in the "unminded tx-store" see explanation in #7800. So we need both parts of the solution. If nobody is already working on this one, I can do that. Still two questions I have:
|
Update regarding BTCD as a backend for the BIP157 connection: When using btcd instead of bitcoind the double spend is recognized and the transaction removed, so it could be a short term fix to connect all breez wallets to btcd backends ? @roeierez how do breez clients decide which BIP157 peer they choose, is it hard coded or based on discovery ? |
No discovery. We have a configuration that users can change https://doc.breez.technology/Connecting-a-Bitcoin-node-supporting-BIP-157.html |
Yep...they removed the |
Our sweeper logic which is not state aware relied in a way on this rejecting behaviour. |
We should address this issue, for now we cannot completely solve this behaviour (without a big design change of the sweeper) because we are not getting the reject msg back into core (don't know the reason for their decision tho.). But I think we could mitigate some of the bad side effects this behaviour causes for neutrino nodes. As we all know now, this problem blocks utxos from the lnd default wallet because invalid transaction are not recognized as invalid by our neutrino backend anymore (no reject msg). This is the main problem but we can mitigate the "how long" of such bad behaviour. We will not able to completely avoid the false broadcasting and hence blocking of utxos but we can limit them in time. In the following I will outline 3 cases where this behaviour effects lnd nodes and describe how my proposed change could mitigate them.
Those Scenarios need to be fixed all at once because the can influence each other as well so we cannot just fix one case. That being said, I think it is worth to mitigate this behaviour and in the meantime think of a better solution which does not bring us in this situation for neutrino wallets at all? Happy to hear your thoughts and questions :) |
One question I have is: which class of invalid transactions are we missing? If it's an input already being spent, or one that has been double spent, then we should be detecting these as we register for spend notifications for each input we try to sweep. One known missing gap is this btcwallet PR which may help to resolve some of the issues re double spend anchor outputs.
How does this lead to funds being blocked if the contract has already been resolved? Assuming the same set of inputs are used, then contract resolution would then imply that those UTXO no longer exist.
Agree that this is straightforward enough. The design of the sweeper is that all callers will re-offer their inputs, most cases are covered, other than maybe manual
On startup, a canned transaction is always used, or I think you mean that eventually the inputs will be re-offered leading to another fresh sweep attempt?
One thing I'm not following is why exactly people have some many dependent transactions from a sweep. Are the major neutrino-based mobile wallets all spending unconfirmed change constantly? Also for this series of issues, I think we should try to find and resolve a fundamental issue as far down in the stack as possible. As an example, is there missing I wonder if an even simpler temporary mitigation (see comment above about trying to fix this on the most fundamental layer) would just be: don't try to sweep anchor UTXOs for non fee bumping scenarios for neutrino nodes. At the default fee rate, the anchors should themselves not actually be positive yield, so they should just sit there. IIRC, we actually skip some anchor itests for neutrino for this very reason: they don't detect spends of it in the mempool, and end up holding onto the UTXOs longer (we give the neutrino node more UTXOs to work with so the test is workable). |
Thank you for your insights, I will think about how I can implement a fix on the lower level as you mentioned 👍 Will update my comment here as soon as I have a detailed plan how to proceed with further implementation. |
Hi Ziggie, I'm looking into this to provide guidance on the overall strategy as well, but I did find an answer to this question you asked above:
Looking into the code it seems that we usually publish the tx in the I'm still investigating potential options of what to do regarding your 3 point plan, but I figured I'd relay this in the mean time. |
Thank you @ProofOfKeags really appreciate your thoughts on this. I think you are right with your investigation. Seems like this corner case needs to be investigated whether we can remove it. Before we remove it, we need to be 100% sure the |
Alright I think I am starting to grasp the scope of this. First off, thank you for the rigorous analysis @ziggie1984. The core problem seems to stem from the fact that we do not have a timely reliable way to detect whether a transaction is rejected by the network with neutrino. However, we do have a reliable way to detect if an output we are currently attempting to sweep is spent by another transaction, since BIP158 states that every output script of every input is a member of the filter. This is ultimately impossible to solve in a timely manner since bitcoind removed the reject message, the only way we can determine with certainty that the transaction was rejected is if a conflicting transaction is mined. Correspondingly, the only way we have to determine that it was accepted is if the transaction we expect is eventually mined. On average this will be 10minutes away and can be significantly longer than that. Secondly, as referenced in my previous comment, the reason that we indiscriminately broadcast the last sweep is to avoid creating too much empty space in the wallet tree and overrunning the AGL. As a general rule this means that the main thing we need to ensure is:
If this invariant is guaranteed by the sweeper, we no longer need to use the blunt tool of always republishing in the startup sequence (which, by my analysis is a hack to sidestep the AGL issue). This change would still need to be compatible with us abandoning the sweep process for that input if any conflicting spend is mined. This is something we should be able to ascertain from the neutrino backend since all inputs are in the filter set. By my estimation, this part is actually already guaranteed by the sweeper for remote parties, and can be improved by implementing the fix you provide in point 2. However, as you point out, simply doing what you say in point 2 is not enough to solve the core problem. It merely allows LND to heal itself eventually. Not the worst, but still not a real solution IMO. Now that leaves point 3, which I think is rather thorny and where I would like to lean on @Roasbeef to correct me if any of the following intuitions overlook important issues. From what I can tell, proactively broadcasting anchor spends in non-fee bumping scenarios seems superfluous. Anchors exist for the sole purpose of ensuring timely confirmation of transactions we would like to see confirmed. As far as I can tell, any remote commitment transaction is preferred to the local commitment for 2 reasons. First, revoked remote commitments give us the chance to hit our peer with the justice transaction which is always the most favorable outcome for us. Second, the remote commitment for the current state is preferable because it grants us access to our funds more quickly. With this in mind, since we should be able to compute the exact structure of any remote commitment transaction for the lifetime of the channel, we should be able to know for sure whether any of them are in the mempool so long as we can get our hands on the mempool............BUT, this is where things get tricky because my understanding is that neutrino offers us no ability to get at the mempool. The trouble with the approach of always CPFP'ing the anchor of the remote commitment is that it is barely more sensible to do that for the current remote commitment than it is do do it for all remote commitments for the lifetime of the channel. Game theoretically the old ones should never be broadcast and so preemptively ensuring their confirmation is very unlikely to bear fruit. The current remote commitment is certainly possible and the only downside to having our own commitment confirm instead of the remote is the lockup period. If we are OK with letting the suboptimal nature of our commitment confirming in the presence of a remote commitment that is broadcast, then we should be able to ditch the "blind broadcast" logic of the remote commitment anchor CPFP transaction entirely. Given that our peer is incentivized the same way in the mirror image scenario means that even our attempt to CPFP the remote is less likely to bear fruit than it would appear at first. All of this points to the idea that we should drop the behavior of always CPFP'ing the remote commitment of the current state entirely. This is a really long winded way of agreeing with @Roasbeef's assessment in his last paragraph but with a lot more justification for why. However, I think we can and ought to go further than his suggestion, because making this depend on which backend we use bears a complexity cost to LND that would be annoying to maintain. So unless there is strong disagreement, and I do think that I need an ACK from @Roasbeef on this, I think the approach is as follows:
I believe this solves all 3 problems you outline above. |
Thank you for your ideas 🙏, I will present my game plan for fixing this in the following: Making the sweeper idempotent could fail situations where we want to bump the fee of an input. An input could also slide in a higher feegroup over time because the fee-estimation is based on block-targets. My current solution for this would be, as roasbeef suggested instead of just broadcasting the latest sweep transaction we also register each input of that transaction with our Chain-Notifier. The only thing we need to add into lnd to trigger the removal of conflicting transactions, which would lead us to remove the last sweep transaction eventually. Wdyt ? With this approach we could keep the broadcasting and make sure we do not end up in address inflation ? Regarding the ditching of the CPFP logic. I think roasbeef was talking about the sweeping of anchors when the Commitment Tx is already confirmed. That I agree we should make that configurable and switch in off by default. Will have a prototype by tomorrow so that you can have a look how I envision it from the code side. The plan is to split this change into 2 PRs:
|
I think this would only be possible by adding more state to the sweeper (basically the entire input set that would be swept along with it). The original design of the sweeper was to be more or less stateless, and continually try to replace/bump in the background until something confirmed, then revise the sweeps and do it all over again. If we want to fully bind all the inputs, eg: an input can only ever be spent along side its cohorts, then we'd need to persistently track all the various sweeps (including RBF bumps, etc). The other issue with this goal is that we don't always control all the spends paths of the inputs we're trying to sweep. Example include: 3rd parties spending anchors, breaches, remote party sweeping with preimage when we go to time out the HTLC, etc. If we fully bound the sweep set, then naively the inputs that were batched along with that HTLC output couldn't be published again (most strict interpretation).
Agree that for cases where live HTLCs are involved (mainly routing nodes), we need to also be able to bump the fee of the remote party's commitment transaction. This is made more difficult by the fact hat we don't actually know if theirs is in the mempool or not (sans polling our mempool to see if it's there), which resulted in the sort of "blind bumping" strategy we have today. |
If this was the case we don't need to sweep the anchors of the local commitment because it is necessarily impossible for both the local and remote commitment to confirm. So either we should be able to sidestep the issue of 2 mutually conflicting anchor sweeps by waiting until confirmation of one such commitment. Alternatively, we have to consider the pre-confirmation scenario which is what the main purpose of the anchors are to begin with, although we may classify them as CPFP instead of "sweeping".
As far as I understand the ChainNotifier state doesn't persist across restarts of LND. I believe the sweeping logic must be able to tolerate interruption and restarting of the daemon. Alternatively we need to persist enough information to be able to re-register for those notifications.
I was unclear in my previous communication. I mean dropping this logic as it pertains to automatically sweeping local anchors for both local and remote commitment txs. We have to keep the CPFP logic for explicit fee bumping scenarios, but should not do so automatically I don't think.
Yes, it would require more state tracking than it currently has. If the current sweeper was designed to be stateless was that a desire or a requirement? If it is a requirement, what is forcing the statelessness here?
I see, I had failed to consider the batching mechanic. This is true under my proposal. However, I think the part that was missed is that the sweep idempotence should be "reset" if an input used in that sweep is spent by a different (confirmed) transaction. The idea is to sidestep the scenario where we conflict with ourselves and therefore locking up inputs until we detect that conflict. However, while we are awaiting the confirmation of some transaction that will spend any of these outputs, there's no reason we should choose to broadcast a conflicting transaction, except in the case of explicit fee bumping. |
Hmm I think we need to keep the auto-feature as well. As roasbeef stated, especially for routing nodes we can end up in a situation where we had an Outgoing HTLC which is timing out but the remote peer already force closed prior to our HTLC timing out and still dangling in the mempool (ours as well because of low fees because he had no urgent need to bump it). Now blocks pass by and eventually our HTLC hits the timeout, now the CPFP needs to happen automatically without user intervention because its time-critical here.
Exactly, I am currently saving all the related input information in an additional subbucket to re-register them after a restart or something similar.
Agree we should not broadcast sweeps we know beforehand that they will be rejected by our backend. But we need to balance more complexity just for neutrino backends vs maybe the solution where broadcast an additional sweep but resolving it when the related output is spent after a couple of minutes/hours ? |
I think we can instead remove the republishing last tx logic. At first glance it's hacky, and it just hopes by the time the republish fails, If it does crash, we should instead investigate the reason but not restart it. Plus during restart, sweeper won't create any transactions unless there are pending inputs being sent and the batch time ticker has fired.
Note that the whole tx is serialized and saved to disk. During restart, there would only be one transaction being republished. Overall I think the rebroadcasting logic in So back to this issue, I think we can limit the scope and look into how we can remove the republishing last tx logic. |
Sounds good, I realized while working on a prototype that the scope can easily become to big when fixing everything at once. So I am happy to first remove the lastsweep tx logic, because I agree it feels hacky and trying to even consider registering a notification for each input of the last sweep would increase the hacky factor as well. So I will just remove this logic first. Let me know if somebody has any concerns with this approach ;) |
Potential solution from Joost (when we decide to remove the last sweep tx logic to prevent address inflation):
|
I'd rather leak privacy than risk fund loss. Do it. |
Background
As a follow up to this #7599 (comment)
The last sweep tx is persisted and always rebraodcated on startup. In case this tx double spends another tx it causes the user balance to stay in unconfirmed state.
Your environment
Steps to reproduce
We have seen this in support cases on closed channels.
Expected behaviour
We expect the double spent transaction to be removed and not constitently being rebroadcasted.
Actual behaviour
Sweep tx always rebroadcasted.
In general since Btcwallet has a mechnism to rebroadcast unmined transactions so do you see any reason to persist the last sweep tx and republish it at startup by the sweeper? https://github.com/lightningnetwork/lnd/blob/master/sweep/sweeper.go#L344
The text was updated successfully, but these errors were encountered: