Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Graph Abstraction & Gossiper->to->DB stack clean-up with Remote Graph & Gossip 1.75 projects in mind #9494

Open
7 tasks
ellemouton opened this issue Feb 10, 2025 · 0 comments
Assignees
Labels

Comments

@ellemouton
Copy link
Collaborator

ellemouton commented Feb 10, 2025

Click for Background Context

Today's Gossip -> DB stack

(click to see diagram)

Image

As seen in the diagram, today we have the following flow for gossip messages:

Gossiper

  1. An lnwire (LN protocol) message comes in from either our peers or from us into our gossiper
  2. The gossiper does some read calls via the graph.Builder to the DB to ensure basic DoS protection (ie should we bother continuing with this announcement at all?)
  3. The gossiper then does protocol level checks on the gossip such as: is the signature valid? (and other checks mentioned in bolt 7). NB: today, the gossiper does not do funding transaction validation when it gets a channel_announcement as this is currently done further down the stack. This should be changed.
  4. Once validated, the gossiper converts the lnwire message to our internal models representation (** this is probably the wrong place for this conversion) and calls the graph.Builder's Add/Update methods.

graph.Builder

  1. The Builder does a few checks of the incoming messages before passing them on to the ChannelGraph.
    - node_announcement: again checks freshness (** repeats the check that the gossiper already did...)
    - channel_update: checks that we do know about the channel & that the update is fresh (** again repeating what was done in gossiper)
    - channel_announcement: does funding tx validation (** wrong place!!) along with checks like ensuring we dont already know of this channel (again already done in gossiper).
  2. the Builder also does a couple of other maintenance tasks:
    1) it is responsible for pruning closed channels & marking channels as zombies
    2) it provides a topology change subscription since it knows when we go and actually persist a new update & need to notify clients.

ChannelGraph

  • This is our CRUD layer. It has a direct connection to a backing kvdb.Backend & does all our persistence logic.

  • It also constructs and maintains the graphCache which, is a cache that holds the info required by the router.

  • many parts of the code-base currently have direct access to the graphdb.ChannelGraph.

Remote Graph Vision

If many LND nodes are owned by the same entity, there is really no need for them all to sync their own gossip on init. A node can instead persist just its own gossip updates but instead rely on a remote graph source for populating the rest of its graphCache for the purposes of pathfinding.

Here is a diagram showing an example configuration: One Graph Source that 2 clients depend on:

Multi client vision

Image

Zooming in on 1 client

Image

To get to this vision, however, there are a few things that we need to consider and change to get from the current architecture to this ideal architecture:

  1. In the remote graph client set-up, the graphCache will be populated both via our local updates along with updates from the remote. So it makes sense to have this cache lifted out of the CRUD layer. This will also lend itself to the gossip v1.75 changes (see later on). So steps here include:
    - rename the current ChannelGraph to be a more descriptive KVDBStore (or V1Store (see gossip updates later).
    - Create a new ChannelGraph struct which is responsible for creating the graphCache.
    - The KVDBStore the only defines CRUD logic. Which is a cleaner separation anyways.
  2. All read-calls should go through the new graphdb.ChannelGraph instead of going directly to the CRUD layer. This is needed so that these calls can correctly query the graph-cache/remote graph where needed.
  3. Topology subscriptions/management needs to move out of the graph.Builder and into the new graphdb.ChannelGraph since this is where management of the remote source will happen and so if we want our topology subscription clients to be up to date with changes in the remote source (and not only updates from our own node), then it makes sense for this to be done in the ChannelGraph.
  4. All calls to the ChannelGraph, both reads and writes, should take a context.Context for 2 reasons:
    1) to prepare for any remote gRPC calls which need a context
    2) to prepare for SQL DB backend which will also take a context.

Gossip V175 support vision

click

Image

A couple of things to keep in mind for the gossip 175 support:

  • we will be supporting 2 disjoint protocols. So there will be 2 distinct DBs and we should not need to check things about a given node/channel across the 2 protocols (except for a few edge cases).
  • The 2 separate DBs (and ie 2 separate CRUD layers) is another good reason for getting the graphCache out of the current V1Store layer and into the new ChannelGraph layer.
  • Given that we will have 2 DBs, we will need a layer to Mux things: notice that the V1Store CRUD will deal with *models.Channel1/Node1/Update1 struct types and V2Store CRUD will deal with *models.Channel2/Node2/Update2. So our ChannelGraph layer will also deal with providing Read interface methods to the rest of the code base via new models.Channel/Node/Update interfaces. This is good to keep in mind from the start since there will be some time when some read calls to ChannelGraph are just forwarded directly to the current CRUD layer. So it might be confusing as to why we have that extra layer - but the reason is to allow for this future where we want to mux results.

Given the detailed vision and context re other associated projects outlined above, let's narrow down on the initial goals of this ticket that is just focused on Graph Query Abstraction & general clean-up and separation of concerns.

This is what we are aiming for:

Here are the initial high level steps to completion. Along the way during review, small things will probably be added that are worth addressing.

Steps to completion (not necessarily in order)

  • Move funding transaction verification logic from the graph.Builder to the gossiper (discovery+graph: move funding tx validation to the gossiper #9478)
  • Rename ChannelGraph to KVDBStore, introduce a new ChannelGraph and let it handle the graphCache (ie move cache handling out of CRUD layer).
  • Move the topology management/subscriptions from the Builder to the new ChannelGraph
  • For each sub-system in LND that currently has access to a direct pointer to the DB, let them define interfaces instead and let those interface methods be implemented by the new ChannelGraph.
    - [x] autopilot server: graph+autopilot: remove autopilot access to raw graphdb.ChannelGraph #9480
    - [ ] invoices rpc server
    - [ ] netann
    - [ ] rpcserver
  • let the gossiper only deal with lnwire types. The Builder should then be responsible for converting to our internal models types.
  • For each write and read method exposed via the ChannelGraph, update them to take a context. This will involve ensuring that any calling sub-systems actually have a context to thread through. So quite a few PRs will be dedicated to just threading contexts through.

Additional goals added on during review:

  • Address this comment: once a context is passed through the Builder's AddEdge/AddNode/UpdateEdge methods, we can thread these contexts through rather & ensure that persistence happens before exiting the call.

Associated Issues:

Completed PRs:

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant