feat(bandwidth_scheduler) - generate bandwidth requests based on receipts in outgoing buffers v2.0 #12464

jancionear · 2024-11-15T02:00:15Z

This is a rewrite of #12375, following the redesign that we discussed offline.

Main changes

Use a TrieQueue instead of a single value with all of the receipt groups
ReceiptGroup now contains both size and gas
Keep information about total gas, size and number of receipts for every outgoing buffer. Useful for resharding.
Don't read the first receipt when generating bandwidth requests. This doesn't play well with witness size limits, reading the first receipt could potentially add 4MB of storage proof, and we need to do it for every outgoing buffer.
Groups now have an upper size bound to ensure that size of the first group is at most max_receipt_size (as long as the size of a single receipt is at most max_receipt_size...)
Use the version of StateStoredReceipt to determine if the metadata should be updated. Don't add fields which keep the version of metadata.

The code is ready, but untested. I need to write a bunch of tests. You can take a look at the code in the meantime, it'll be easier to apply any changes before all the tests are written.

The PR is meant to be reviewed commit-by-commit.

Currently TrieQueue allows only storing receipts. I would like to store other things in a TrieQueue, let's make the item type generic.

They will be needed to load the OutgoingMetadatas in the next commit. I put these changes in a separate commit for clarity.

TrieQueueIterator returns Result<Item, StorageError>, so it can't be passed to the function that crates a bandwidth request. Modify the function to take iterators over results instead of iterators over values.

wacban

Just a high level check but looks good to me. I'll leave the proper review to @shreyan-gupta .

wacban · 2024-11-15T11:24:27Z

core/primitives/src/receipt.rs

+/// The V1 of StateStoredReceipt.
+/// The data is the same as in V0.
+/// Outgoing buffer metadata is updated only for versions V1 and higher.
+/// The receipts start being stored as V1 after the protocol change that introduced
+/// outgoing buffer metadata. Having a separate variant makes it clear whether the
+/// outgoing buffer metadata should be updated when a receipt is stored/removed.


Nice, I think it's pretty neat!

wacban · 2024-11-15T11:27:34Z

core/primitives/src/trie_key.rs

+    /// Stores `ReceiptGroupsQueueData` for the receipt groups queue
+    /// which corresponds to the buffered receipts to `receiver_shard`.
+    pub const OUTGOING_RECEIPT_GROUPS_QUEUE_DATA: u8 = 16;
+    /// A single item of `ReceiptGroupsQueue`. Values are of type `ReceiptGroup`.
+    pub const OUTGOING_RECEIPT_GROUPS_QUEUE_ITEM: u8 = 17;


Naming wise I think it would be more clear to call those BUFFERED_RECEIPT_GROUPS_* here - just to make it clear that this is connected to the BUFFERED_RECEIPT* columns.

I like queue data and queue item even though they don't follow the existing convention of _INDICES and just the type - I think it's fine as the queue data stores more than just indices.

I can change it to BUFFERED_RECEIPT_GROUPS_QUEUE_DATA/ITEM. To me "outgoing" is more specific than "buffered". Buffering is a generic action that could be done in many different contexts, "outgoing" clearly refers to the outgoing receipts. But I don't care that much, buffered is also fine.
Ideally it would be "BUFFERED_OUTGOING" x.x

Maybe Shreyan also has an opinion here? I'll wait for his review before renaming it.

In general I agree that outgoing would be better, but now that we already have BUFFERED I'd rather keep things consistent. Yeah, it's just a nit, I'm not too attached to either name.

I don't mind either prefix, BUFFERED_RECEIPT_GROUP_* or OUTGOING_RECEIPT_GROUP_* but I would still like to stick with the *_INDICES prefix as it gives a clear indication of what to expect there.

Looking at _INDICES I can expect there to exist a TrieQueue etc. etc. but looking directly at _DATA and _ITEM doesn't tell me much. I understand we are storing a bit more than just the indices in our case but the primary purpose is still to track indices. This is me coming from figuring out resharding, sitting and staring at all the trie key names and trying to understand what they mean...

Maybe OUTGOING_RECEIPT_GROUP_INDICES and OUTGOING_RECEIPT_GROUP?

I don't really like INDICES, it lies about what the data really is. There is a comment pointing to ReceiptGroupsQueueData, if someone is interested in what exactly the data is, they can quickly go there and take a look.

I'll change it to BUFFERED_RECEIPT_GROUPS_QUEUE_DATA/ITEM. I think Wac liked it as well, so that's 2 votes for this version.

wacban · 2024-11-15T11:27:47Z

core/primitives/src/trie_key.rs

-
-    // NOTE: NEW_COLUMN = 15 will be the last unique nibble in the trie!
-    // Consider demultiplexing on 15 and using 2-nibble prefixes.


wacban · 2024-11-15T11:29:19Z

core/primitives/src/trie_key.rs

+            TrieKey::OutgoingReceiptGroupsQueueData { .. } => {
+                col::OUTGOING_RECEIPT_GROUPS_QUEUE_DATA.len() + std::mem::size_of::<u64>()
+            }
+            TrieKey::OutgoingReceiptGroupsQueueItem { index, .. } => {
+                col::OUTGOING_RECEIPT_GROUPS_QUEUE_ITEM.len()
+                    + std::mem::size_of::<u64>()
+                    + std::mem::size_of_val(index)
+            }


mini nit: Maybe size_of:: ?

I like size_of_val more than size_of, it's impossible to make a mistake, it always calculates the actual size of the field. I only used size_of for the shard_id because I wasn't sure how size_of_val is going to behave with the ShardId wrapper.

Ah sorry the formatting was messed up, I wanted to suggest size_of ShardId but that < > made the ShardId disappear.

wacban · 2024-11-15T12:11:44Z

core/store/src/trie/outgoing_metadata.rs

+    pub fn default_config() -> Self {
+        // TODO(bandwidth_scheduler) - put in runtime config
+        ReceiptGroupsConfig {
+            size_lower_bound: ByteSize::kb(90),


Should the lower bound be the same as the diff between two consecutive requests? Is that 100kB?

Diff between two consecutive values that can be requested is ~111kB
All groups will be at least as large as lower_bound, and that should be smaller than the diff between two values. I'm not sure what's the ideal value 🤔 I guess the smaller we make the groups the better the precision of the request. I could reduce the group size to something like 50kB.

shreyan-gupta

Looks great overall! Thank you so much!

shreyan-gupta · 2024-11-18T09:36:17Z

core/store/src/trie/shard_tries.rs

@@ -90,6 +92,10 @@ impl ShardTries {
        TrieUpdate::new(self.get_view_trie_for_shard(shard_uid, state_root))
    }

+    pub fn get_shard_uids(&self) -> &[ShardUId] {


Where are we using this function? Is it not possible to use the shard layout to get the shard_uids?

It's used by the runtime params estimator, in estimator::apply_action_receipt.
The EstimatorContext::testbed function manually creates data for a shard, there's no epoch manager or shard layout.
It would be possible to refactor the estimator to use a shard layout, but the easiest way was to get the list of shards from the list of tries per shard. AFAIU there's only one shard at the moment, so it's not a big issue.

shreyan-gupta · 2024-11-18T09:39:32Z

core/store/src/trie/receipts_column_helper.rs

@@ -60,6 +58,8 @@ pub struct OutgoingReceiptBuffer<'parent> {
 /// queue items. Based on that, a common push(), pop(), len(), and iter()
 /// implementation is provided as trait default implementation.
 pub trait TrieQueue {
+    type Item<'a>: BorshDeserialize + BorshSerialize;


Could we add a quick todo comment above to remove the 'a lifetime indicator once we remove Cow from receipt?

shreyan-gupta · 2024-11-18T09:42:06Z

core/store/src/trie/receipts_column_helper.rs

@@ -189,6 +252,8 @@ impl DelayedReceiptQueue {
 }

 impl TrieQueue for DelayedReceiptQueue {
+    type Item<'a> = ReceiptOrStateStoredReceipt<'a>;


For DelayedReceipts, we don't really store any metadata right? Could we change the type of Item to just Receipt?

The delayed receipt queue also uses state stored receipts. (See DelayedReceiptQueueWrapper). We need to update congestion info every time a receipt is added/removed from the delayed queue, and the stored metadata makes sure that the metadata stays consistent across protocol changes.

core/store/src/trie/receipts_column_helper.rs

shreyan-gupta · 2024-11-18T09:44:50Z

core/store/src/trie/receipts_column_helper.rs

-/// receipts.
-pub struct ReceiptIterator<'a> {
+/// Read-only iterator over items stored in a TrieQueue.
+pub struct TrieQueueIterator<'a, Queue: TrieQueue> {


Nice! Love the generalization!

shreyan-gupta · 2024-11-18T10:54:30Z

core/store/src/trie/outgoing_metadata.rs

+    /// Corresponds to receipts stored in the outgoing buffer to this shard.
+    receiver_shard: ShardId,
+    /// Persistent data, stored in the trie.
+    data: ReceiptGroupsQueueDataV0,


Why aren't we storing ReceiptGroupsQueueData directly?

It's easier to work with ReceiptGroupsQueueDataV0, as it's a normal struct with fields that can be accessed. Using the enum would require decomposing it every time or adding access methods :/
The code can be reorganised in the future if we end up needing to use DataV1.

shreyan-gupta · 2024-11-18T11:04:08Z

core/store/src/trie/outgoing_metadata.rs

+/// to determine the size and structure of receipts in the outgoing buffer and make
+/// bandwidth requests based on them.
+#[derive(Debug)]
+pub struct ReceiptGroupsQueue {


This struct is used only within this file. Could we remove pub from all functions and declarations?

It's also used in the receipt sink. When generating bandwidth requests ReceiptSink fetches the ReceiptsGroupsQueue for some shard and generates the bandwidth requests based on the groups in the queue.
In the previous iteration of the PR I had a separate struct that served a public interface for outgoing metadata to some shard, and the queue could be hidden in a private module, but Wac complained that it's redundant so I removed it 😅
#12375 (comment)

shreyan-gupta · 2024-11-18T11:06:09Z

core/store/src/trie/outgoing_metadata.rs

+        // Take out the last group from the queue and inspect it.
+        match self.pop_back(state_update)? {
+            Some(mut last_group) => {
+                if self.groups_config.should_start_new_group(&last_group, receipt_size, receipt_gas)


Seems like an overkill to store groups_config as part of ReceiptGroupsQueue given it's only being used here. I would consider passing it through update_on_receipt_pushed function, specially since it doesn't need to be publically exposed.

It's convenient to have it there. And I think it makes sense - a receipt group queue operates using some group config. It's nice for ergonomics, you don't need to pass the config every time. I like it, I'd prefer to keep it this way.
The config is not saved to the state if that's what you're worried about.

shreyan-gupta · 2024-11-18T11:08:00Z

core/store/src/trie/outgoing_metadata.rs

+    /// Returns empty metadata for protocol versions which don't support metadata.
+    /// Make sure to pass shard ids for every shard that has receipts in the outgoing buffer,
+    /// otherwise the metadata for it will be overwritten with empty metadata.
+    pub fn load(


So, question about all the load functions that we have here...

Can we structure the code in such a way that the load functions are only called with BandwidthScheduler feature enabled?

In which case, can we get rid of all logic that assumes optional loading? Examples are update_* functions where we do .entry().or_insert_with() and ReceiptGroupsQueue::load() returning Option<Self>?

Initialization on enabling BandwidthScheduler feature could be handled by checking if trie key exists and returning default.

It becomes hard to think about all cases where we may or may not be returning/initializing a value, considering we don't have consistent checks for BandwidthScheduler feature.

The code works the way it does because:

There are already some outgoing buffers, we need to create matching metadata for them

During resharding a new shard could suddenly appear and we need to initialize the metadata for it

We can't really scan the trie for existing outgoing buffers because the TrieAccess trait doesn't have an iterator

The easiest way to achieve this was to initialize the metadata when a receipt for it appears.

Can we structure the code in such a way that the load functions are only called with BandwidthScheduler feature enabled?

That's the case right now. OutgoingMetadatas::load doesn't do anything when the feature is not enabled. And nothing is pushed/popped from the metadata before protocol change.

In which case, can we get rid of all logic that assumes optional loading? Examples are update_* functions where we do .entry().or_insert_with() and ReceiptGroupsQueue::load() returning Option?

During resharding there could be a new shard, the or_insert_with is meant to handle this case.

Initialization on enabling BandwidthScheduler feature could be handled by checking if trie key exists and returning default.

That is kind of what's happening, OutgoingMetadatas::load tries to load the metadata, and if the trie doesn't contain the key it creates a new one.

But I'm open to other designs as well, it's a bit convoluted :/

shreyan-gupta · 2024-11-18T11:18:41Z

core/primitives/src/trie_key.rs

+    /// Stores `ReceiptGroupsQueueData` for the receipt groups queue
+    /// which corresponds to the buffered receipts to `receiver_shard`.
+    pub const OUTGOING_RECEIPT_GROUPS_QUEUE_DATA: u8 = 16;
+    /// A single item of `ReceiptGroupsQueue`. Values are of type `ReceiptGroup`.
+    pub const OUTGOING_RECEIPT_GROUPS_QUEUE_ITEM: u8 = 17;


I don't mind either prefix, BUFFERED_RECEIPT_GROUP_* or OUTGOING_RECEIPT_GROUP_* but I would still like to stick with the *_INDICES prefix as it gives a clear indication of what to expect there.

Looking at _INDICES I can expect there to exist a TrieQueue etc. etc. but looking directly at _DATA and _ITEM doesn't tell me much. I understand we are storing a bit more than just the indices in our case but the primary purpose is still to track indices. This is me coming from figuring out resharding, sitting and staring at all the trie key names and trying to understand what they mean...

Maybe OUTGOING_RECEIPT_GROUP_INDICES and OUTGOING_RECEIPT_GROUP?

Co-authored-by: Shreyan Gupta <shreyan.gupta96@gmail.com>

jancionear · 2024-11-18T17:33:01Z

core/store/src/trie/outgoing_metadata.rs

+    pub gas_lower_bound: Gas,
+    /// All receipt groups aim to have gas below this threshold.
+    pub gas_upper_bound: Gas,
+}


I spent some more time thinking about the optimal lower and upper bound sizes for receipt groups, and I think I'll:

Remove the lower bound

Reduce the upper bound to something like 100kB

This means that a new group will be started if adding a new receipt to the group would cause its size to go above 100kB.

As a result all groups will be below 100kB, except for the groups that contain a single receipt with size above 100kB.

AFAIU having groups like this will produce optimal bandwidth requests - identical to the ones that we would produce if we walked over all of receipts individually.

The diff between two values that can be requested is ~110kB. If all groups were below 100kB we would always produce optimal requests - each consecutive group would either request a value or move to the next value, just like individual receipts.

When there are receipts larger than 100kB we will naturally skip some of the values, but this is identical to what would happen if we operated on individual receipts. The next group smaller than 100kB will either request the same value or move to the next one, just like individual receipts.

A bit handwavy, but I feel that it works. I can make some drawings if needed.

jancionear added 15 commits November 14, 2024 21:52

Add StateStoredReceiptV1

cc672a2

Make TrieQueue generic over item type

67b48a8

Currently TrieQueue allows only storing receipts. I would like to store other things in a TrieQueue, let's make the item type generic.

Rename TrieQueue::push to TrieQueue::push_back

0ce833f

Rename TrieQueue::pop to pop_front

2e35da0

Add TrieQueue::pop_back

8f1118b

Add TrieQueue::modify_first

b2ab786

Add OutgoingMetadatas and ReceiptGroupQueue

c96a26a

Add Trie columns to store receipt groups data

6f152c5

Remove the comment about column=15 being a problem

41909b3

Pass shard_uids to estimator::apply_action_receipt

101f7fa

They will be needed to load the OutgoingMetadatas in the next commit. I put these changes in a separate commit for clarity.

Maintain outgoing buffer metadatas in ReceiptSink

8531d80

Refactor ReceiptSink

686b146

Take iterator over results when creating a bandwidth request

c48b403

TrieQueueIterator returns Result<Item, StorageError>, so it can't be passed to the function that crates a bandwidth request. Modify the function to take iterators over results instead of iterators over values.

Generate bandwidth requests

6975fac

Update protocol schema checksums

f55eba7

jancionear requested review from wacban and shreyan-gupta November 15, 2024 02:00

jancionear requested a review from a team as a code owner November 15, 2024 02:00

jancionear mentioned this pull request Nov 15, 2024

feat(bandwidth_scheduler) - generate bandwidth requests based on receipts in outgoing buffers #12375

Closed

Fix protocol schema again

3fd8ccf

wacban reviewed Nov 15, 2024

View reviewed changes

jancionear mentioned this pull request Nov 15, 2024

Nuke cows in Receipt #12470

Draft

shreyan-gupta reviewed Nov 18, 2024

View reviewed changes

jancionear and others added 7 commits November 18, 2024 14:09

Simplify Item definition in ReceiptIterator

2ae2ed0

Co-authored-by: Shreyan Gupta <shreyan.gupta96@gmail.com>

TrieQueue::Item instead of Receipt in error message

4da8f44

Co-authored-by: Shreyan Gupta <shreyan.gupta96@gmail.com>

Fix Parameters typo

eaef8e6

Co-authored-by: Shreyan Gupta <shreyan.gupta96@gmail.com>

Fix Needed typo

d8526d3

Co-authored-by: Shreyan Gupta <shreyan.gupta96@gmail.com>

Fix needed typo 2

5a6b66a

Co-authored-by: Shreyan Gupta <shreyan.gupta96@gmail.com>

Remove lifetime from ReceiptSink

638b7ac

Co-authored-by: Shreyan Gupta <shreyan.gupta96@gmail.com>

Fix initialized typo

54de07b

Co-authored-by: Shreyan Gupta <shreyan.gupta96@gmail.com>

jancionear added 2 commits November 18, 2024 14:27

substract without the s

5688e31

Add Cow TODO

1e60e18

jancionear commented Nov 18, 2024

View reviewed changes

jancionear added 3 commits November 18, 2024 17:44

remove lower_bound, set upper_bound to 100kB

249a693

Rename trie columns

628f19e

Fix protocol schema check after renaming trie column

0ed79be

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(bandwidth_scheduler) - generate bandwidth requests based on receipts in outgoing buffers v2.0 #12464

feat(bandwidth_scheduler) - generate bandwidth requests based on receipts in outgoing buffers v2.0 #12464

jancionear commented Nov 15, 2024

wacban left a comment

wacban Nov 15, 2024

wacban Nov 15, 2024

jancionear Nov 15, 2024 •

edited

Loading

wacban Nov 15, 2024

shreyan-gupta Nov 18, 2024

jancionear Nov 18, 2024

wacban Nov 15, 2024

wacban Nov 15, 2024

jancionear Nov 15, 2024

wacban Nov 15, 2024

wacban Nov 15, 2024

jancionear Nov 15, 2024 •

edited

Loading

shreyan-gupta left a comment

shreyan-gupta Nov 18, 2024

jancionear Nov 18, 2024

shreyan-gupta Nov 18, 2024

shreyan-gupta Nov 18, 2024

jancionear Nov 18, 2024 •

edited

Loading

shreyan-gupta Nov 18, 2024

shreyan-gupta Nov 18, 2024

jancionear Nov 18, 2024

shreyan-gupta Nov 18, 2024

jancionear Nov 18, 2024 •

edited

Loading

shreyan-gupta Nov 18, 2024

jancionear Nov 18, 2024

shreyan-gupta Nov 18, 2024

jancionear Nov 18, 2024

shreyan-gupta Nov 18, 2024

jancionear Nov 18, 2024


		// NOTE: NEW_COLUMN = 15 will be the last unique nibble in the trie!
		// Consider demultiplexing on 15 and using 2-nibble prefixes.

feat(bandwidth_scheduler) - generate bandwidth requests based on receipts in outgoing buffers v2.0 #12464

Are you sure you want to change the base?

feat(bandwidth_scheduler) - generate bandwidth requests based on receipts in outgoing buffers v2.0 #12464

Conversation

jancionear commented Nov 15, 2024

Main changes

wacban left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jancionear Nov 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jancionear Nov 15, 2024 • edited Loading

Choose a reason for hiding this comment

shreyan-gupta left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jancionear Nov 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jancionear Nov 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jancionear Nov 15, 2024 •

edited

Loading

jancionear Nov 15, 2024 •

edited

Loading

jancionear Nov 18, 2024 •

edited

Loading

jancionear Nov 18, 2024 •

edited

Loading