Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

feat(bandwidth_scheduler) - generate bandwidth requests based on receipts in outgoing buffers v2.0 #12464

Open
wants to merge 28 commits into
base: master
Choose a base branch
from

Conversation

jancionear
Copy link
Contributor

This is a rewrite of #12375, following the redesign that we discussed offline.

Main changes

  • Use a TrieQueue instead of a single value with all of the receipt groups
  • ReceiptGroup now contains both size and gas
  • Keep information about total gas, size and number of receipts for every outgoing buffer. Useful for resharding.
  • Don't read the first receipt when generating bandwidth requests. This doesn't play well with witness size limits, reading the first receipt could potentially add 4MB of storage proof, and we need to do it for every outgoing buffer.
  • Groups now have an upper size bound to ensure that size of the first group is at most max_receipt_size (as long as the size of a single receipt is at most max_receipt_size...)
  • Use the version of StateStoredReceipt to determine if the metadata should be updated. Don't add fields which keep the version of metadata.

The code is ready, but untested. I need to write a bunch of tests. You can take a look at the code in the meantime, it'll be easier to apply any changes before all the tests are written.

The PR is meant to be reviewed commit-by-commit.

Currently TrieQueue allows only storing
receipts. I would like to store other things
in a TrieQueue, let's make the item type generic.
They will be needed to load the OutgoingMetadatas
in the next commit. I put these changes in a separate
commit for clarity.
TrieQueueIterator returns Result<Item, StorageError>,
so it can't be passed to the function that crates a bandwidth request.
Modify the function to take iterators over results instead
of iterators over values.
Copy link
Contributor

@wacban wacban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a high level check but looks good to me. I'll leave the proper review to @shreyan-gupta .

Comment on lines +118 to +123
/// The V1 of StateStoredReceipt.
/// The data is the same as in V0.
/// Outgoing buffer metadata is updated only for versions V1 and higher.
/// The receipts start being stored as V1 after the protocol change that introduced
/// outgoing buffer metadata. Having a separate variant makes it clear whether the
/// outgoing buffer metadata should be updated when a receipt is stored/removed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I think it's pretty neat!

Comment on lines 60 to 64
/// Stores `ReceiptGroupsQueueData` for the receipt groups queue
/// which corresponds to the buffered receipts to `receiver_shard`.
pub const OUTGOING_RECEIPT_GROUPS_QUEUE_DATA: u8 = 16;
/// A single item of `ReceiptGroupsQueue`. Values are of type `ReceiptGroup`.
pub const OUTGOING_RECEIPT_GROUPS_QUEUE_ITEM: u8 = 17;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming wise I think it would be more clear to call those BUFFERED_RECEIPT_GROUPS_* here - just to make it clear that this is connected to the BUFFERED_RECEIPT* columns.

I like queue data and queue item even though they don't follow the existing convention of _INDICES and just the type - I think it's fine as the queue data stores more than just indices.

Copy link
Contributor Author

@jancionear jancionear Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change it to BUFFERED_RECEIPT_GROUPS_QUEUE_DATA/ITEM. To me "outgoing" is more specific than "buffered". Buffering is a generic action that could be done in many different contexts, "outgoing" clearly refers to the outgoing receipts. But I don't care that much, buffered is also fine.
Ideally it would be "BUFFERED_OUTGOING" x.x

Maybe Shreyan also has an opinion here? I'll wait for his review before renaming it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I agree that outgoing would be better, but now that we already have BUFFERED I'd rather keep things consistent. Yeah, it's just a nit, I'm not too attached to either name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind either prefix, BUFFERED_RECEIPT_GROUP_* or OUTGOING_RECEIPT_GROUP_* but I would still like to stick with the *_INDICES prefix as it gives a clear indication of what to expect there.

Looking at _INDICES I can expect there to exist a TrieQueue etc. etc. but looking directly at _DATA and _ITEM doesn't tell me much. I understand we are storing a bit more than just the indices in our case but the primary purpose is still to track indices. This is me coming from figuring out resharding, sitting and staring at all the trie key names and trying to understand what they mean...

Maybe OUTGOING_RECEIPT_GROUP_INDICES and OUTGOING_RECEIPT_GROUP?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like INDICES, it lies about what the data really is. There is a comment pointing to ReceiptGroupsQueueData, if someone is interested in what exactly the data is, they can quickly go there and take a look.

I'll change it to BUFFERED_RECEIPT_GROUPS_QUEUE_DATA/ITEM. I think Wac liked it as well, so that's 2 votes for this version.

Comment on lines -62 to -64

// NOTE: NEW_COLUMN = 15 will be the last unique nibble in the trie!
// Consider demultiplexing on 15 and using 2-nibble prefixes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you

Comment on lines 272 to 279
TrieKey::OutgoingReceiptGroupsQueueData { .. } => {
col::OUTGOING_RECEIPT_GROUPS_QUEUE_DATA.len() + std::mem::size_of::<u64>()
}
TrieKey::OutgoingReceiptGroupsQueueItem { index, .. } => {
col::OUTGOING_RECEIPT_GROUPS_QUEUE_ITEM.len()
+ std::mem::size_of::<u64>()
+ std::mem::size_of_val(index)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mini nit: Maybe size_of:: ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like size_of_val more than size_of, it's impossible to make a mistake, it always calculates the actual size of the field. I only used size_of for the shard_id because I wasn't sure how size_of_val is going to behave with the ShardId wrapper.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry the formatting was messed up, I wanted to suggest size_of ShardId but that < > made the ShardId disappear.

pub fn default_config() -> Self {
// TODO(bandwidth_scheduler) - put in runtime config
ReceiptGroupsConfig {
size_lower_bound: ByteSize::kb(90),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the lower bound be the same as the diff between two consecutive requests? Is that 100kB?

Copy link
Contributor Author

@jancionear jancionear Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Diff between two consecutive values that can be requested is ~111kB
All groups will be at least as large as lower_bound, and that should be smaller than the diff between two values. I'm not sure what's the ideal value 🤔 I guess the smaller we make the groups the better the precision of the request. I could reduce the group size to something like 50kB.

Copy link
Contributor

@shreyan-gupta shreyan-gupta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great overall! Thank you so much!

@@ -90,6 +92,10 @@ impl ShardTries {
TrieUpdate::new(self.get_view_trie_for_shard(shard_uid, state_root))
}

pub fn get_shard_uids(&self) -> &[ShardUId] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are we using this function? Is it not possible to use the shard layout to get the shard_uids?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used by the runtime params estimator, in estimator::apply_action_receipt.
The EstimatorContext::testbed function manually creates data for a shard, there's no epoch manager or shard layout.
It would be possible to refactor the estimator to use a shard layout, but the easiest way was to get the list of shards from the list of tries per shard. AFAIU there's only one shard at the moment, so it's not a big issue.

@@ -60,6 +58,8 @@ pub struct OutgoingReceiptBuffer<'parent> {
/// queue items. Based on that, a common push(), pop(), len(), and iter()
/// implementation is provided as trait default implementation.
pub trait TrieQueue {
type Item<'a>: BorshDeserialize + BorshSerialize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a quick todo comment above to remove the 'a lifetime indicator once we remove Cow from receipt?

@@ -189,6 +252,8 @@ impl DelayedReceiptQueue {
}

impl TrieQueue for DelayedReceiptQueue {
type Item<'a> = ReceiptOrStateStoredReceipt<'a>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For DelayedReceipts, we don't really store any metadata right? Could we change the type of Item to just Receipt?

Copy link
Contributor Author

@jancionear jancionear Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The delayed receipt queue also uses state stored receipts. (See DelayedReceiptQueueWrapper). We need to update congestion info every time a receipt is added/removed from the delayed queue, and the stored metadata makes sure that the metadata stays consistent across protocol changes.

core/store/src/trie/receipts_column_helper.rs Outdated Show resolved Hide resolved
/// receipts.
pub struct ReceiptIterator<'a> {
/// Read-only iterator over items stored in a TrieQueue.
pub struct TrieQueueIterator<'a, Queue: TrieQueue> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Love the generalization!

/// Corresponds to receipts stored in the outgoing buffer to this shard.
receiver_shard: ShardId,
/// Persistent data, stored in the trie.
data: ReceiptGroupsQueueDataV0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why aren't we storing ReceiptGroupsQueueData directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's easier to work with ReceiptGroupsQueueDataV0, as it's a normal struct with fields that can be accessed. Using the enum would require decomposing it every time or adding access methods :/
The code can be reorganised in the future if we end up needing to use DataV1.

/// to determine the size and structure of receipts in the outgoing buffer and make
/// bandwidth requests based on them.
#[derive(Debug)]
pub struct ReceiptGroupsQueue {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This struct is used only within this file. Could we remove pub from all functions and declarations?

Copy link
Contributor Author

@jancionear jancionear Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also used in the receipt sink. When generating bandwidth requests ReceiptSink fetches the ReceiptsGroupsQueue for some shard and generates the bandwidth requests based on the groups in the queue.
In the previous iteration of the PR I had a separate struct that served a public interface for outgoing metadata to some shard, and the queue could be hidden in a private module, but Wac complained that it's redundant so I removed it 😅
#12375 (comment)

// Take out the last group from the queue and inspect it.
match self.pop_back(state_update)? {
Some(mut last_group) => {
if self.groups_config.should_start_new_group(&last_group, receipt_size, receipt_gas)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like an overkill to store groups_config as part of ReceiptGroupsQueue given it's only being used here. I would consider passing it through update_on_receipt_pushed function, specially since it doesn't need to be publically exposed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's convenient to have it there. And I think it makes sense - a receipt group queue operates using some group config. It's nice for ergonomics, you don't need to pass the config every time. I like it, I'd prefer to keep it this way.
The config is not saved to the state if that's what you're worried about.

/// Returns empty metadata for protocol versions which don't support metadata.
/// Make sure to pass shard ids for every shard that has receipts in the outgoing buffer,
/// otherwise the metadata for it will be overwritten with empty metadata.
pub fn load(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, question about all the load functions that we have here...

  • Can we structure the code in such a way that the load functions are only called with BandwidthScheduler feature enabled?
  • In which case, can we get rid of all logic that assumes optional loading? Examples are update_* functions where we do .entry().or_insert_with() and ReceiptGroupsQueue::load() returning Option<Self>?
  • Initialization on enabling BandwidthScheduler feature could be handled by checking if trie key exists and returning default.

It becomes hard to think about all cases where we may or may not be returning/initializing a value, considering we don't have consistent checks for BandwidthScheduler feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code works the way it does because:

  • There are already some outgoing buffers, we need to create matching metadata for them
  • During resharding a new shard could suddenly appear and we need to initialize the metadata for it
  • We can't really scan the trie for existing outgoing buffers because the TrieAccess trait doesn't have an iterator

The easiest way to achieve this was to initialize the metadata when a receipt for it appears.

Can we structure the code in such a way that the load functions are only called with BandwidthScheduler feature enabled?

That's the case right now. OutgoingMetadatas::load doesn't do anything when the feature is not enabled. And nothing is pushed/popped from the metadata before protocol change.

In which case, can we get rid of all logic that assumes optional loading? Examples are update_* functions where we do .entry().or_insert_with() and ReceiptGroupsQueue::load() returning Option?

During resharding there could be a new shard, the or_insert_with is meant to handle this case.

Initialization on enabling BandwidthScheduler feature could be handled by checking if trie key exists and returning default.

That is kind of what's happening, OutgoingMetadatas::load tries to load the metadata, and if the trie doesn't contain the key it creates a new one.

But I'm open to other designs as well, it's a bit convoluted :/

Comment on lines 60 to 64
/// Stores `ReceiptGroupsQueueData` for the receipt groups queue
/// which corresponds to the buffered receipts to `receiver_shard`.
pub const OUTGOING_RECEIPT_GROUPS_QUEUE_DATA: u8 = 16;
/// A single item of `ReceiptGroupsQueue`. Values are of type `ReceiptGroup`.
pub const OUTGOING_RECEIPT_GROUPS_QUEUE_ITEM: u8 = 17;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind either prefix, BUFFERED_RECEIPT_GROUP_* or OUTGOING_RECEIPT_GROUP_* but I would still like to stick with the *_INDICES prefix as it gives a clear indication of what to expect there.

Looking at _INDICES I can expect there to exist a TrieQueue etc. etc. but looking directly at _DATA and _ITEM doesn't tell me much. I understand we are storing a bit more than just the indices in our case but the primary purpose is still to track indices. This is me coming from figuring out resharding, sitting and staring at all the trie key names and trying to understand what they mean...

Maybe OUTGOING_RECEIPT_GROUP_INDICES and OUTGOING_RECEIPT_GROUP?

jancionear and others added 7 commits November 18, 2024 14:09
Co-authored-by: Shreyan Gupta <shreyan.gupta96@gmail.com>
Co-authored-by: Shreyan Gupta <shreyan.gupta96@gmail.com>
Co-authored-by: Shreyan Gupta <shreyan.gupta96@gmail.com>
Co-authored-by: Shreyan Gupta <shreyan.gupta96@gmail.com>
Co-authored-by: Shreyan Gupta <shreyan.gupta96@gmail.com>
Co-authored-by: Shreyan Gupta <shreyan.gupta96@gmail.com>
Co-authored-by: Shreyan Gupta <shreyan.gupta96@gmail.com>
pub gas_lower_bound: Gas,
/// All receipt groups aim to have gas below this threshold.
pub gas_upper_bound: Gas,
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent some more time thinking about the optimal lower and upper bound sizes for receipt groups, and I think I'll:

  • Remove the lower bound
  • Reduce the upper bound to something like 100kB

This means that a new group will be started if adding a new receipt to the group would cause its size to go above 100kB.

As a result all groups will be below 100kB, except for the groups that contain a single receipt with size above 100kB.

AFAIU having groups like this will produce optimal bandwidth requests - identical to the ones that we would produce if we walked over all of receipts individually.

The diff between two values that can be requested is ~110kB. If all groups were below 100kB we would always produce optimal requests - each consecutive group would either request a value or move to the next value, just like individual receipts.

When there are receipts larger than 100kB we will naturally skip some of the values, but this is identical to what would happen if we operated on individual receipts. The next group smaller than 100kB will either request the same value or move to the next one, just like individual receipts.

A bit handwavy, but I feel that it works. I can make some drawings if needed.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants