Improve `splice_locked` retransmission logic #1223

t-bast · 2025-01-27T16:38:26Z

The splice spec currently requires retransmitting splice_locked on reconnection if:

A receiving node:
  - MUST send `splice_locked` for the most recent splice transaction that reached
    acceptable depth, unless `splice_locked` was already sent for this transaction
    and `next_commitment_number` is strictly greater than the `commitment_number + 1`
    that was used when `splice_locked` was previously sent (which indicates that
    the previous `splice_locked` was received).

Those requirements are a bit hard to express and mean that we may be retransmitting splice_locked even though our peer has already received it. It also creates a slightly weird race condition if one side has not received splice_locked and tries using the channel immediately on reconnection, which we detail below. The goal of this issue is to improve the splice_locked retransmission requirements to get rid of those issues.

`splice_locked` / `update_add_htlc` reconnect race condition

We start with Alice and Bob having no pending splice, and thus the following initial active commitments:

+------------+
| FundingTx1 |
+------------+

Alice initiates a splice, but disconnects before Bob receives her splice_locked:

   Alice                           Bob
     | splice_init                  |
     |----------------------------->|
     |                   splice_ack |
     |<-----------------------------|
     |       <interactive-tx>       |
     |<---------------------------->|
     |                              |
     |      <splice confirms>       |
     |                              |
     | splice_locked                |
     |----------------------------->|
     |                splice_locked |
     |              X---------------|
     |                              |
     |       <disconnection>        |
     |                              |

At that point, the splice tx is fully locked for Bob, but not for Alice who is waiting for Bob's splice_locked.

Alice's active commitments are:

+------------+        +------------+
| FundingTx1 |------->| FundingTx2 |
+------------+        +------------+

Bob can forget the previous commitment since the splice is locked, and his active commitments are just:

+------------+
| FundingTx2 |
+------------+

On reconnection, Bob will re-send splice_locked, but if Alice wants to use the channel right after receiving channel_reestablish, there is a race condition:

   Alice                           Bob
     |                              |
     | channel_reestablish          |
     |----------------------------->|
     |          channel_reestablish |
     |<-----------------------------|
     | update_add_htlc              |
     |---------------X              |
     | commit_sig                   | batch_size=2, funding_tx=funding_tx_1
     |---------------X              |
     | commit_sig                   | batch_size=2, funding_tx=funding_tx_2
     |---------------X              |
     |                splice_locked |
     |              X---------------|
     | update_add_htlc              |
     |              --------------->|
     | commit_sig                   | IGNORED: unknown funding tx
     |              --------------->|
     | commit_sig                   |
     |              --------------->|
     |                splice_locked |
     |<---------------              |
     |               revoke_and_ack |
     |<-----------------------------|
     |                   commit_sig | batch_size=1, funding_tx=funding_tx_2
     |<-----------------------------|
     | revoke_and_ack               |
     |----------------------------->|

Now that Alice has received Bob's splice_locked, she can forget fundingTx1.
Bob can safely ignore Alice's commit_sig for fundingTx1, so this race condition is harmless.

But for taproot channels it's more complicated: Alice will not be able to create commit_sig because she will be missing nonces!
The main issue is that splice_locked retransmission isn't atomic with channel_reestablish, even though it conceptually should: how do we fix that?

Option 1: add `current_splice_locked` field to `channel_reestablish`

A simple option would be to add a current_splice_locked TLV in the channel_reestablish message containing the latest funding_txid for which:

we have received splice_locked
we have sent or are ready to send splice_locked

In the example above, Alice would have set it to funding_tx_1 while Bob would have set it to funding_tx_2.

When receiving Alice's channel_reestablish:

Bob would know that she hasn't received his splice_locked.
Bob would thus retransmit it before any other channel message.

When receiving Bob's channel_reestablish:

Alice would know that he received her splice_locked.
Alice would also know that Bob has sent splice_locked and will retransmit it immediately.
She could then immediately prune funding_tx_1 and only send commit_sig for funding_tx_2.

A nice side-effect of this change is that we only retransmit splice_locked when we know that our peer hasn't already received it.
This is simpler than the current requirement to retransmit until the commitment_number has increased (which implicitly tells you that your peer received your splice_locked).

Option 2: include `splice_locked` inside `channel_reestablish`

Another option would be to remove the retransmission of splice_locked entirely and instead include the whole splice_locked message inside channel_reestablish, making it atomic.
It works, but it feels hackish: why not do the same with channel_ready retransmission if we're doing it for splice_locked?
Also, including a whole message in another message's TLV field doesn't feel very elegant...but it guarantees that we atomically receive all the data we need ¯_(ツ)_/¯.

A drawback for this is that we will keep unnecessarily including this splice_locked until the commitment_number has increased, even when our peer has received it already.

Option 3: delay messages until `splice_locked` is received

Without taproot channels, this race condition is harmless, so we could decide to do nothing about it.

With taproot channels, Alice would detect in Bob's channel_reestablish that she is missing nonces for funding_tx_1.
She would then implicitly know that it means that Bob has sent splice_locked and will retransmit it.
She could then pause the channels and avoid sending update_add_htlc while she waits for this splice_locked message.

This sounds brittle, I'm not a big fan of this solution.

The text was updated successfully, but these errors were encountered:

t-bast · 2025-01-27T16:39:16Z

@ddustin I'd like to discuss this during today's spec meeting if you can have a look at this issue!

If one side sent `splice_locked` and the other side is ready to send its own `splice_locked` while they are disconnected, this creates a race condition on reestablish because `splice_locked` is retransmitted after `channel_reestablish`, and other channel updates can be inserted by the other node before receiving `splice_locked`. This will be an issue for taproot channels, because nonces will be missing. This race condition is described in more details in lightning#1223. We fix this race condition by adding a TLV to `channel_reestablish` that provides information about the latest locked transaction. This additional information also makes it easier to detect when we need to retransmit our previous `splice_locked`.

t-bast · 2025-01-28T09:44:06Z

I have specified option 1 in 2c1b500, this simplifies the retransmission requirements and resolves this race condition, let me know what you think!

If one side sent `splice_locked` and the other side is ready to send its own `splice_locked` while they are disconnected, this creates a race condition on reestablish because `splice_locked` is retransmitted after `channel_reestablish`, and other channel updates can be inserted by the other node before receiving `splice_locked`. This will be an issue for taproot channels, because nonces will be missing. This race condition is described in more details in lightning#1223. We fix this race condition by adding a TLV to `channel_reestablish` that provides information about the latest locked transaction. This additional information also makes it easier to detect when we need to retransmit our previous `splice_locked`.

t-bast mentioned this issue Jan 22, 2025

Lightning Specification Meeting 2025/01/27 #1221

Closed

23 tasks

t-bast mentioned this issue Jan 28, 2025

Implement updated retransmission logic for splice_locked ACINQ/eclair#2987

Open

t-bast mentioned this issue Feb 4, 2025

Lightning Specification Meeting 2025/02/10 #1224

Open

21 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve `splice_locked` retransmission logic #1223

Improve `splice_locked` retransmission logic #1223

t-bast commented Jan 27, 2025

t-bast commented Jan 27, 2025

t-bast commented Jan 28, 2025

Improve splice_locked retransmission logic #1223

Improve splice_locked retransmission logic #1223

Comments

t-bast commented Jan 27, 2025

splice_locked / update_add_htlc reconnect race condition

Option 1: add current_splice_locked field to channel_reestablish

Option 2: include splice_locked inside channel_reestablish

Option 3: delay messages until splice_locked is received

t-bast commented Jan 27, 2025

t-bast commented Jan 28, 2025

Improve `splice_locked` retransmission logic #1223

Improve `splice_locked` retransmission logic #1223

`splice_locked` / `update_add_htlc` reconnect race condition

Option 1: add `current_splice_locked` field to `channel_reestablish`

Option 2: include `splice_locked` inside `channel_reestablish`

Option 3: delay messages until `splice_locked` is received