Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add network connectivity to health-checks #644

Merged
merged 22 commits into from
Jan 31, 2025
Merged

Conversation

iansuvak
Copy link
Contributor

Why this should be merged

Closes #447
Partially addresses #619

How this works

  • Removes the TrackSubnet() call from previously called ConnectToCanonicalValidators. This was the only thing actually initializing connections and it was asynchronous and the return of that function was primarily concerned with checking the current status of the validator set and connections to them
  • Fixes the bug where ConnectToCanonicalValidators was always returning as connected to all validators without checking the actual network state of completing handshakes
  • Adds healthcheck function to the signatureAggregator

How this was tested

  • E2E testing
  • New unit test for the GetConnectedCanonicalValidators
  • Manually starting up against Fuji and confirming that signature-aggregator doesn't return healthy until it's ready to aggregate signatures from the C-chain.

How is this documented

Comments

@iansuvak iansuvak changed the title Add primary network connectivity as sig-agg healthcheck Add network connectivity to health-checks Jan 27, 2025
@iansuvak iansuvak self-assigned this Jan 27, 2025
Copy link
Collaborator

@cam-schultz cam-schultz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few nits, but overall this LGTM. I like the separation between network and listener health checks.

@@ -139,7 +139,11 @@ func main() {
metricsInstance,
signatureAggregator,
)
healthcheck.HandleHealthCheckRequest()

healthCheckSubnets := cfg.GetTrackedSubnets().List()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Not for this PR] A neat extension would be to add newly requested L1's to healthCheckSubnets, and perhaps evict them after some duration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing this up! This is a neat idea but I don't like that this would make it possible to force public facing signature-aggregator instances into unhealthy states by requesting signatures from valid subnets with unreachable nodes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. Adding an L1 to healthCheckSubnets only after an aggregate signature has successfully been constructed may be a fair compromise. It would be nice to support health checks for high traffic L1's that are not included in the config. Either way, we can defer to a future issue.

iansuvak and others added 3 commits January 29, 2025 10:11
Co-authored-by: cam-schultz <78878559+cam-schultz@users.noreply.github.com>
Signed-off-by: Ian Suvak <ian.suvak@avalabs.org>
Copy link
Contributor

@bernard-avalabs bernard-avalabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just have a clarification question about connected validators.

Copy link
Collaborator

@cam-schultz cam-schultz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just one nit

@iansuvak iansuvak merged commit d64f888 into main Jan 31, 2025
8 checks passed
@iansuvak iansuvak deleted the healthcheck-sig-agg branch January 31, 2025 15:41
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add more meaningful health checks to signature-aggregator
4 participants