Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

ACP-77: Implement validator state #3388

Merged
merged 227 commits into from
Nov 5, 2024

Conversation

StephenButtolph
Copy link
Contributor

@StephenButtolph StephenButtolph commented Sep 13, 2024

Why this should be merged

ACP-77 introduces a new type of validator: the SubnetOnlyValidator.

This PR introduces this new validator type to the P-chain state.

How this works

SubnetOnlyValidator

For the new SubnetOnlyValidator type, the validator can either be active (EndAccumulatedFee != 0) or inactive (EndAccumulatedFee == 0).

active validators are handled similarly to other validator types. They can be sampled during consensus, can contribute to block production, can contribute to ICM (Warp) messages, and are held in memory to efficiently perform these operations. To limit the memory pressure introduced by active validators, the number of active validators is limited.

Unlike active validators, inactive validators are not sampled during consensus, can not contribute to block production, and can not contribute to ICM (Warp) messages. This allows inactive validators to be removed from memory and only kept on disk. Each L1 will have an entry in their validator set representing the cumulative amount of inactive stake.

Removing a SubnetOnlyValidator is performed simply by setting their Weight to be 0.

State Modifications

The P-chain state is broken into 2 implementations:

  1. The on-disk representation state
  2. Potential changes to the on-disk representation diff

The typical flow for performing a change is:

  1. Create a diff on top of either the state or another diff.
  2. Apply the diff to the parent, either the state or another diff.
  3. Atomically commit changes made to state to disk.

Contrary to prior validator implementations within the state package, this PR attempts to provide validation of the requested changes.

Returned errors from PutSubnetOnlyValidator should result in no changes to the state.

PutSubnetOnlyValidator succeeding should result in a valid state representation, regardless of the expected API behavior.

The only exception to this is documented with:

// If an SoV with the same validationID as a previously removed SoV is
// added, the behavior is undefined.

It is difficult to handle this case within the state without introducing non-deterministic error reporting because validationIDs are deleted from disk when the state is deleted. This means that the state code would need to handle all of the fields of an SoV to change on demand. However, that would require additional changes to the validator manager and other parts of the code (which make assumptions around ValidationID implying a static subnetID + nodeID pair).

The in-memory validator manager and historical validator set diffs are only updated during the atomic disk commitment changes.

How this was tested

Added unit tests.

Need to be documented in RELEASES.md?

Added P-chain configs:

  • l1-weights-cache-size
  • l1-inactive-validators-cache-size
  • l1-subnet-id-node-id-cache-size

ceyonur and others added 30 commits July 24, 2024 16:42
Co-authored-by: Darioush Jalali <darioush.jalali@avalabs.org>
Signed-off-by: Ceyhun Onur <ceyhunonur54@gmail.com>
Signed-off-by: Ceyhun Onur <ceyhun.onur@avalabs.org>
Co-authored-by: Darioush Jalali <darioush.jalali@avalabs.org>
Signed-off-by: Ceyhun Onur <ceyhunonur54@gmail.com>
@StephenButtolph StephenButtolph changed the base branch from master to add-weight-diff-helpers November 2, 2024 00:00
Base automatically changed from add-weight-diff-helpers to master November 3, 2024 16:31
Copy link
Contributor

@ceyonur ceyonur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
(I left a question for the caching in #3516 (comment))

vms/platformvm/state/state.go Outdated Show resolved Hide resolved
vms/platformvm/state/state.go Outdated Show resolved Hide resolved
maybeSOVOverhead = wrappers.BoolLen + sovOverhead
entryOverhead = ids.IDLen + maybeSOVOverhead
)
if maybeSOV.IsNothing() {
Copy link
Contributor

@yacovm yacovm Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

under what circumstance is it nothing? Both here and in general (line 196 in subnet_only_validator.go)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cache stores hits as maybe.Some and misses as maybe.Nothing. So, if someone performs a delete or a get which results in not found, maybe.Nothing would be cached.

return false, fmt.Errorf("%w: %s", ErrMissingParentState, d.parentID)
}

return parentState.HasSubnetOnlyValidator(subnetID, nodeID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might seem as a weird question, but - how deep is the expected recursion here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recursion ends at the lastAcceptedState and starts at the most recently verified block, so typically the recursion is very shallow.

return weight, nil
}

parentState, ok := d.stateVersions.GetState(d.parentID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain the high level idea of doing this backwards traversal in search of the subnet, both here and in the next function?

We basically don't have a full snapshot of the validators in each state? Why is that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performing a full copy of the validator states for each diff would be pretty expensive. The number of changes per block is typically small, but the number total entries being tracked can be quite large.

Comment on lines 874 to 875
// SubnetOnlyValidator with the given validationID. It is guaranteed that any
// returned validator is either active or inactive.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(No action needed) Aren't all validators always either active or inactive? Not sure I understand the purpose of the comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was supposed to signify that no deleted SoVs will be returned from this function (isDeleted() will always return false). Specifically, this is to ensure that !isActive() means inactive (and not deleted)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// SubnetOnlyValidator with the given validationID. It is guaranteed that any
// returned validator is either active or inactive.
// returned validator is either active or inactive (not deleted).

// ensures that a subnetID+nodeID pair that was deleted and then re-added in
// a single diff can't get reordered into the addition happening first;
// which would return an error.
for _, sov := range d.sovDiff.modified {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Playing devil's advocate here, we're iterating here over a map, and calling baseState.PutSubnetOnlyValidator which may return with different errors. This means that across different nodes, we may fail differently, no?

Is that a concern?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is generally unexpected for Apply to ever error.

Apply is called during Accept. If Apply were to return an error, that error would be propagated through Accept into the consensus engine (causing a FATAL).

if priorSOV.Weight < sov.Weight {
err = s.validators.AddWeight(sov.SubnetID, nodeID, sov.Weight-priorSOV.Weight)
} else if priorSOV.Weight > sov.Weight {
err = s.validators.RemoveWeight(sov.SubnetID, nodeID, priorSOV.Weight-sov.Weight)
Copy link
Contributor

@yacovm yacovm Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is priorSOV.Weight == sov.Weight legal? what does it mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the priorSOV.Weight is greater than sov.Weight, then the weight of the validator is being decreased by priorSOV.Weight - sov.Weight units

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no but i asked what it means when they're equal, I edited shortly after, probably github didn't refresh my comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. It is possible for them to be equal. In that case the weight does not need to be modified.

feeState gas.State
sovExcess gas.Gas
accruedFees uint64
parentActiveSOVs int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: numParentActiveSOVs is more descriptive IMO

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or parentActiveSOVCount because a "count" is more descriptive than a "number" 😉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up going with parentNumActiveSOVs.

Because it aligns most with the actual source of this: parentState.NumActiveSubnetOnlyValidators()

return parentState.WeightOfSubnetOnlyValidators(subnetID)
}

func (d *diff) GetSubnetOnlyValidator(validationID ids.ID) (SubnetOnlyValidator, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple high level questions that may be related:

  • What triggers a new diff to be created?
  • Did you consider updating diffs in place, rather than the linked list style approach here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Diffs are created for a few reasons. But the reason the diffs are designed this way is because each diff (can) represent the changes that are performed by a block. When a block is verified a diff is created. When a block is accepted a diff is applied to the disk state.

Diffs are updated in-place during the execution (verification) of a block - but we wouldn't want a child block to modify the state that will be committed when the parent block is accepted (as the child could be rejected for a different child)


// GetSubnetOnlyValidator returns the validator with [validationID] if it
// exists. If the validator does not exist, [err] will equal
// [database.ErrNotFound].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit odd to call out the error type here when that's not enforceable by the interface

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is particularly unusual... This is done in the standard lib as well. For an example, see SetDeadline here: https://pkg.go.dev/net#Conn

@@ -2230,6 +2571,16 @@ func (s *state) writeValidatorDiffs(height uint64) error {
return nil
}

func getOrDefault[K comparable, V any](m map[K]*V, k K) *V {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment explaining that the default value is written to the map would be helpful. Or consider renaming to getOrAddDefault

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did both

for subnetID, weight := range s.sovDiff.modifiedTotalWeight {
var err error
if weight == 0 {
err = s.weightsDB.Delete(subnetID[:])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the expected behavior of WeightOfSubnetOnlyValidators after the entry is deleted from the DB and the cache entry expires? If it's to error, perhaps we should not write to the cache so that WeightOfSubnetOnlyValidators deterministically errors for any subnets with weight 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WeightOfSubnetOnlyValidators calls database.WithDefault(database.GetUInt64, s.weightsDB, subnetID[:], 0) which will return 0 if the value is not on disk. So, WeightOfSubnetOnlyValidators deterministically returns 0 for any subnets with weight 0.

vms/platformvm/state/subnet_only_validator.go Show resolved Hide resolved
Comment on lines 874 to 875
// SubnetOnlyValidator with the given validationID. It is guaranteed that any
// returned validator is either active or inactive.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// SubnetOnlyValidator with the given validationID. It is guaranteed that any
// returned validator is either active or inactive.
// returned validator is either active or inactive (not deleted).

vms/platformvm/state/state.go Show resolved Hide resolved
vms/platformvm/state/state.go Outdated Show resolved Hide resolved
vms/platformvm/state/state_test.go Show resolved Hide resolved
@StephenButtolph StephenButtolph added this pull request to the merge queue Nov 5, 2024
Merged via the queue into master with commit ca5e0d5 Nov 5, 2024
23 checks passed
@StephenButtolph StephenButtolph deleted the implement-acp-77-sov-validators-state branch November 5, 2024 16:39
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

9 participants