Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Convert to attributable errors #2256

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

joostjager
Copy link
Contributor

@joostjager joostjager commented May 2, 2023

Implements lightning/bolts#1044

This PR serves to:

Not included:

  • Support for legacy failures, this PR forces attributable errors always.
  • Timing data in the intermediate and final failure payload field.
  • Handling of failure cases where a node corrupts the failure message.
  • Working tests.
  • Clean up of various debug log/print statements.

@joostjager
Copy link
Contributor Author

Single hop interop testing passes.

Multi hop interop testing seems to be blocked by gossip propagation issues between lnd and rust-lightning.

@TheBlueMatt
Copy link
Collaborator

Nice! That's not too bad, thanks for working on this. I'll dig into the crypto in a day or two. Have a number of comments on the code itself and structure, but I assume its not really worth digging into until we support both old and new errors? I'm happy to give more code-level feedback now if you prefer.

Multi hop interop testing seems to be blocked by gossip propagation issues between lnd and rust-lightning.

Oh? Is this some known issue on the lnd end? I'm not aware of any gossip errors.

@joostjager
Copy link
Contributor Author

Have a number of comments on the code itself and structure, but I assume its not really worth digging into until we support both old and new errors? I'm happy to give more code-level feedback now if you prefer.

This doesn't surprise me. This is my first venture into Rust land. It is a bit of a switch from golang, and I need to get used to how things are done. The language is not bad though, I can see why people fall in love with it! But yes, can hold off on code-level comments for now.

Oh? Is this some known issue on the lnd end? I'm not aware of any gossip errors.

Not sure if it is a known issue. I've had nagging gossip issues before when I tried to do a pathfinding benchmark. For this PR, I did apply a patch to rust-lightning somewhere to force-send node announcement always and then it worked better. Will try to come up with a reasonable repro scenario.

@TheBlueMatt
Copy link
Collaborator

What are you using to do the testing? I assume ldk-sample of some form? If you change the timer at https://github.com/lightningdevkit/ldk-sample/blob/main/src/main.rs#L791 it will rebroadcast a fresh node announcement more aggressively.

@joostjager
Copy link
Contributor Author

Yes, ldk-sample. It was doing the timer alright, but somehow it got filtered out in the timer handler.

@TheBlueMatt
Copy link
Collaborator

Would be happy to take a look at logs. At the TRACE level we should basically be writing everything that is going out or coming in on the wire.

@joostjager
Copy link
Contributor Author

Yes, so I did see that the announcement wasn't going out. Will continue tomorrow and get back with more data.

@TheBlueMatt
Copy link
Collaborator

Errr, right, so to avoid dumping tons of gossip crap that message is only really logged at a high level (Broadcasting NodeAnnouncement after passing it to our own RoutingMessageHandler.), with the remaining logs on a per-peer level being at the GOSSIP level (disabled by default cause its verbose) - Sending message to all peers except {:?} or the announced node: {:?} and Skipping broadcast message to {:?} as its outbound buffer is full

@joostjager
Copy link
Contributor Author

Looks like timer is at 60s. I definitely waited much longer than that, so doesn't seem to be the problem. Enabled gossip logging and saved log files. Continuing in #2259

@joostjager
Copy link
Contributor Author

With gossip fixed with the hint in #2259, I was able to run through a few of the multi-hop inter-op scenarios:
LND -> LDK -> LDK
LDK -> LDK -> LND
LND -> LDK (intermediate failure)

Obviously there are more, but I think this is a good enough sanity check for now.

As mentioned above, attention should go to the crypto part of this feature first.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants