Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Fetch peers p2p #559

Merged
merged 34 commits into from
Nov 26, 2024
Merged

Fetch peers p2p #559

merged 34 commits into from
Nov 26, 2024

Conversation

cam-schultz
Copy link
Collaborator

@cam-schultz cam-schultz commented Nov 21, 2024

Why this should be merged

Connects to peers over p2p, rather than via the info API. Post Etna, info API nodes are not expected to track peers for all subnets, so we instead need to fetch peers directly from primary network validators, which do track all peers across all L1s. We do this by bootstrapping the network with connections to primary network validators (which are still fetched from the info API), and discovering peers via peer gossip. Peer gossip only tracks peers that correspond to known validators, so we also need to maintain an up-to-date view of the validators of the L1s we wish to connect to (including the primary network). This change leverages ava-labs/icm-contracts#641 to confirm that we can connect to L1s with disjoint validator sets from the primary network.

Note: This PR uses a modified version of Avalanchego v1.12.0-fuji that relaxes the outbound peer request handling. Once this change is formalized and included in an Avalanchego release, we will circle back and update the dependency. This workaround is safe because it only impacts outbound messages from the signature aggregator peer.

How this works

  • Create a validator manager and initialize it with the current validators of the primary network and any tracked L1s
    • Periodically update the validator manager with the latest validator sets from the P-Chain API
    • For the signature aggregator service, add new tracked subnets on demand
  • Bootstrap the network with primary network validator peers (fetched from the Info API), which will gossip any known validators across all tracked L1s
  • There's no need to manually connect to peers anymore, as this is handled automatically via peer gossip

How this was tested

CI

How is this documented

N/A

@cam-schultz cam-schultz changed the base branch from main to sov-compatible-test-network November 21, 2024 22:51
Base automatically changed from sov-compatible-test-network to main November 22, 2024 20:04
@cam-schultz cam-schultz marked this pull request as ready for review November 22, 2024 20:24
@cam-schultz cam-schultz requested a review from a team as a code owner November 22, 2024 20:24
Copy link

@richardpringle richardpringle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my only real comment of value for you is the logger.Fatal comment. Everything else is more of an opportunity for you to inform me 😄

ip netip.AddrPort
}

func (c *PeerConfig) Validate() error {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to have a constructor newPeerConfig that does the validation before building the config?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do validate at construction time in the top-level config constructor: https://github.com/ava-labs/awm-relayer/blob/main/relayer/config/viper.go#L21

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the cleanest pattern, but we have to unmarshal the config and validate it in separate steps regardless, it's just a question of code organization.

peers/app_request_network.go Show resolved Hide resolved
peers/app_request_network.go Outdated Show resolved Hide resolved
peers/app_request_network.go Outdated Show resolved Hide resolved
// Maximum amount of time to spend waiting (in addition to network round trip time per attempt)
// during relayer signature query routine
signatureRequestRetryWaitPeriodMs = 10_000
signatureRequestRetryWaitPeriodMs = 20_000

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it's in the description, but why were these values doubled? (Just for my own knowledge)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might not be necessary anymore, but I increased this to account for the increased warm up time when 1) tracking a new subnet or 2) adding/removing validators. With ValidatorManager churn tracking, the latter case is likely rare, but the former will be common on the signature aggregator side. The current implementation eliminates this warm up latency for the new subnet case, so it's probably safe to reduce this back to the original value.

warpMessage = getWarpMessageFromLog(ctx, receipt, subnetBInfo)

reqBody = api.AggregateSignatureRequest{
Message: "0x" + hex.EncodeToString(warpMessage.Bytes()),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a helper for prefixing with a 0x!?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some helpers here and there in various dependencies, but nothing immediately available that I'm able to find. We only do this a couple times and then only in tests, so I'm not too concerned.

go.mod Outdated Show resolved Hide resolved
for _, peer := range peers {
vdrs, err := pClient.GetCurrentValidators(context.Background(), constants.PrimaryNetworkID, []ids.NodeID{peer.ID})
if err != nil {
panic(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably return an error here to be consistent with the rest of this function

peers/app_request_network.go Outdated Show resolved Hide resolved
Comment on lines 161 to 164
// Only track the peer if it is a primary network validator
if len(vdrs) == 0 {
continue
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we're assuming that if the peer knows of any p-chain validators, it itself is a primary network validator? Shouldn't subnet-only validators know of some p-chain validators because they have to sync the p-chain?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is just a confirmation that the peer returned by the infoAPI.Peers is indeed a primary network validator

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is just a confirmation that the peer returned by the infoAPI.Peers is indeed a primary network validator

Yes, this is the case here. We're using pClient.GetCurrentValidators to filter the peers returned by info.Peers. We could definitely do this in one rpc call though. Making individual rpc calls for each peer is pretty inefficient.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this to make one call to GetCurrentValidators

peers/app_request_network.go Outdated Show resolved Hide resolved
peers/app_request_network.go Outdated Show resolved Hide resolved
peers/app_request_network.go Outdated Show resolved Hide resolved
Comment on lines 161 to 164
// Only track the peer if it is a primary network validator
if len(vdrs) == 0 {
continue
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is just a confirmation that the peer returned by the infoAPI.Peers is indeed a primary network validator

peers/app_request_network.go Outdated Show resolved Hide resolved
peers/app_request_network.go Outdated Show resolved Hide resolved
peers/app_request_network.go Outdated Show resolved Hide resolved
peers/app_request_network.go Outdated Show resolved Hide resolved
Copy link
Contributor

@iansuvak iansuvak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, the only outstanding comment I think is @geoff-vball 's about the comment being misleading.

Copy link
Contributor

@geoff-vball geoff-vball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left 3 comments that are all related to a bug you had in one commit but then subsequently fixed. You're welcome to take the code organization suggestion or just merge as is :)

@cam-schultz cam-schultz merged commit 540a332 into main Nov 26, 2024
8 checks passed
@cam-schultz cam-schultz deleted the fetch-peers-p2p branch November 26, 2024 17:29
@cam-schultz cam-schultz mentioned this pull request Dec 10, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants