Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Improve splice_locked retransmission logic #1223

Open
t-bast opened this issue Jan 27, 2025 · 2 comments
Open

Improve splice_locked retransmission logic #1223

t-bast opened this issue Jan 27, 2025 · 2 comments

Comments

@t-bast
Copy link
Collaborator

t-bast commented Jan 27, 2025

The splice spec currently requires retransmitting splice_locked on reconnection if:

A receiving node:
  - MUST send `splice_locked` for the most recent splice transaction that reached
    acceptable depth, unless `splice_locked` was already sent for this transaction
    and `next_commitment_number` is strictly greater than the `commitment_number + 1`
    that was used when `splice_locked` was previously sent (which indicates that
    the previous `splice_locked` was received).

Those requirements are a bit hard to express and mean that we may be retransmitting splice_locked even though our peer has already received it. It also creates a slightly weird race condition if one side has not received splice_locked and tries using the channel immediately on reconnection, which we detail below. The goal of this issue is to improve the splice_locked retransmission requirements to get rid of those issues.

splice_locked / update_add_htlc reconnect race condition

We start with Alice and Bob having no pending splice, and thus the following initial active commitments:

+------------+
| FundingTx1 |
+------------+

Alice initiates a splice, but disconnects before Bob receives her splice_locked:

   Alice                           Bob
     | splice_init                  |
     |----------------------------->|
     |                   splice_ack |
     |<-----------------------------|
     |       <interactive-tx>       |
     |<---------------------------->|
     |                              |
     |      <splice confirms>       |
     |                              |
     | splice_locked                |
     |----------------------------->|
     |                splice_locked |
     |              X---------------|
     |                              |
     |       <disconnection>        |
     |                              |

At that point, the splice tx is fully locked for Bob, but not for Alice who is waiting for Bob's splice_locked.

Alice's active commitments are:

+------------+        +------------+
| FundingTx1 |------->| FundingTx2 |
+------------+        +------------+

Bob can forget the previous commitment since the splice is locked, and his active commitments are just:

+------------+
| FundingTx2 |
+------------+

On reconnection, Bob will re-send splice_locked, but if Alice wants to use the channel right after receiving channel_reestablish, there is a race condition:

   Alice                           Bob
     |                              |
     | channel_reestablish          |
     |----------------------------->|
     |          channel_reestablish |
     |<-----------------------------|
     | update_add_htlc              |
     |---------------X              |
     | commit_sig                   | batch_size=2, funding_tx=funding_tx_1
     |---------------X              |
     | commit_sig                   | batch_size=2, funding_tx=funding_tx_2
     |---------------X              |
     |                splice_locked |
     |              X---------------|
     | update_add_htlc              |
     |              --------------->|
     | commit_sig                   | IGNORED: unknown funding tx
     |              --------------->|
     | commit_sig                   |
     |              --------------->|
     |                splice_locked |
     |<---------------              |
     |               revoke_and_ack |
     |<-----------------------------|
     |                   commit_sig | batch_size=1, funding_tx=funding_tx_2
     |<-----------------------------|
     | revoke_and_ack               |
     |----------------------------->|

Now that Alice has received Bob's splice_locked, she can forget fundingTx1.
Bob can safely ignore Alice's commit_sig for fundingTx1, so this race condition is harmless.

But for taproot channels it's more complicated: Alice will not be able to create commit_sig because she will be missing nonces!
The main issue is that splice_locked retransmission isn't atomic with channel_reestablish, even though it conceptually should: how do we fix that?

Option 1: add current_splice_locked field to channel_reestablish

A simple option would be to add a current_splice_locked TLV in the channel_reestablish message containing the latest funding_txid for which:

  • we have received splice_locked
  • we have sent or are ready to send splice_locked

In the example above, Alice would have set it to funding_tx_1 while Bob would have set it to funding_tx_2.

When receiving Alice's channel_reestablish:

  • Bob would know that she hasn't received his splice_locked.
  • Bob would thus retransmit it before any other channel message.

When receiving Bob's channel_reestablish:

  • Alice would know that he received her splice_locked.
  • Alice would also know that Bob has sent splice_locked and will retransmit it immediately.
  • She could then immediately prune funding_tx_1 and only send commit_sig for funding_tx_2.

A nice side-effect of this change is that we only retransmit splice_locked when we know that our peer hasn't already received it.
This is simpler than the current requirement to retransmit until the commitment_number has increased (which implicitly tells you that your peer received your splice_locked).

Option 2: include splice_locked inside channel_reestablish

Another option would be to remove the retransmission of splice_locked entirely and instead include the whole splice_locked message inside channel_reestablish, making it atomic.
It works, but it feels hackish: why not do the same with channel_ready retransmission if we're doing it for splice_locked?
Also, including a whole message in another message's TLV field doesn't feel very elegant...but it guarantees that we atomically receive all the data we need ¯_(ツ)_/¯.

A drawback for this is that we will keep unnecessarily including this splice_locked until the commitment_number has increased, even when our peer has received it already.

Option 3: delay messages until splice_locked is received

Without taproot channels, this race condition is harmless, so we could decide to do nothing about it.

With taproot channels, Alice would detect in Bob's channel_reestablish that she is missing nonces for funding_tx_1.
She would then implicitly know that it means that Bob has sent splice_locked and will retransmit it.
She could then pause the channels and avoid sending update_add_htlc while she waits for this splice_locked message.

This sounds brittle, I'm not a big fan of this solution.

@t-bast
Copy link
Collaborator Author

t-bast commented Jan 27, 2025

@ddustin I'd like to discuss this during today's spec meeting if you can have a look at this issue!

t-bast added a commit to t-bast/bolts that referenced this issue Jan 28, 2025
If one side sent `splice_locked` and the other side is ready to send
its own `splice_locked` while they are disconnected, this creates a
race condition on reestablish because `splice_locked` is retransmitted
after `channel_reestablish`, and other channel updates can be inserted
by the other node before receiving `splice_locked`. This will be an
issue for taproot channels, because nonces will be missing.
This race condition is described in more details in lightning#1223.

We fix this race condition by adding a TLV to `channel_reestablish`
that provides information about the latest locked transaction. This
additional information also makes it easier to detect when we need
to retransmit our previous `splice_locked`.
@t-bast
Copy link
Collaborator Author

t-bast commented Jan 28, 2025

I have specified option 1 in 2c1b500, this simplifies the retransmission requirements and resolves this race condition, let me know what you think!

t-bast added a commit to t-bast/bolts that referenced this issue Feb 12, 2025
If one side sent `splice_locked` and the other side is ready to send
its own `splice_locked` while they are disconnected, this creates a
race condition on reestablish because `splice_locked` is retransmitted
after `channel_reestablish`, and other channel updates can be inserted
by the other node before receiving `splice_locked`. This will be an
issue for taproot channels, because nonces will be missing.
This race condition is described in more details in lightning#1223.

We fix this race condition by adding a TLV to `channel_reestablish`
that provides information about the latest locked transaction. This
additional information also makes it easier to detect when we need
to retransmit our previous `splice_locked`.
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant