-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
refreshes ContactInfo.outset before initializing validator #3135
refreshes ContactInfo.outset before initializing validator #3135
Conversation
Nodes join gossip during bootstrap process with a stub contact-info which in particular has invalid TVU socket address. Once the bootstrap is done they re-join gossip a 2nd time with a fully populated contact-info, but this contact-info has an outset timestamp older than the 1st one because it was initiated earlier. In v2.0 the outset timestamp determines which contact-info overrides the other, so the v2.0 nodes refrain from updating their CRDS table with the fully initialized contact-info. The commit refreshes ContactInfo.outset before initializing the validator so that it overrides the one pushed to the gossip by the bootstrap stage.
8cc1c32
to
9a89a91
Compare
Hey @steviez @gregcusack, we'll need approval from a subject matter expert in addition to the backport reviewers. |
curious what the testing process for this will be if we backport a commit to v1.18 which has been running on mainnet for some time now. |
I hope we get enough soak time on testnet and mainnet before proceeding with v2.0 upgrade. |
We can adjust the upgrade schedule as needed. How much time would you want this to run on each cluster? The original plan was to do some quick upgrade/downgrade cycles on testnet, and then start ramping v2.0 on mainnet-beta. Something like this: Testnet, over the course of ~2-4 days:
Mainnet-beta:
If we publish a new v1.18 release with this change most mainnet-beta operators will ignore it unless we tell them it's a critical patch. As I understand it, this problem only manifests if there's a restart while the cluster is split v1.18 / v2.0. So we could recommend v1.18.26, for mainnet-beta, but only push it as a critical patch if we happen to have a restart during the upgrade. Is that correct? |
Without this patch, if some v1.18 node decides to restart and runs the bootstrap code, then v2.0 nodes will pick up its stub contact-info from bootstrap and stick to it, in which case the node will be effectively left out of cluster because the stub contact-info only has a gossip address. So I would say it is still much safer if nodes upgrade to this code before more stake is running v2.0 branch. |
been thinking if this is going to cause any issues for v1.18. I don't think it will. v1.18 uses With the upgrade to v2, now v1.18 needs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just about identical to #2666 when you control for the addition of the helpers
Devnet testing succeeded: The v2.0 version of this patch (#2681) went into v2.0.7 which was announced for testnet on 2024-08-25, and then used for the testnet restart on 2024-08-26 @anza-xyz/backport-reviewers Any other testing you'd like to see here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as per discord discussion, lgtm
Problem
Nodes join gossip during bootstrap process with a stub contact-info which in particular has invalid TVU socket address.
Once the bootstrap is done they re-join gossip a 2nd time with a fully populated contact-info, but this contact-info has an outset timestamp older than the 1st one because it was initiated earlier.
In v2.0, the outset timestamp determines which contact-info overrides the other, so the v2.0 nodes refrain from updating their CRDS table with the fully initialized contact-info.
Summary of Changes
The commit refreshes
ContactInfo.outset
before initializing validator so that it overrides the one pushed to gossip by the bootstrap stage.