-
Notifications
You must be signed in to change notification settings - Fork 629
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
feat(bandwidth_scheduler) - generate bandwidth requests based on receipts in outgoing buffers v2.0 #12464
base: master
Are you sure you want to change the base?
Conversation
Currently TrieQueue allows only storing receipts. I would like to store other things in a TrieQueue, let's make the item type generic.
They will be needed to load the OutgoingMetadatas in the next commit. I put these changes in a separate commit for clarity.
TrieQueueIterator returns Result<Item, StorageError>, so it can't be passed to the function that crates a bandwidth request. Modify the function to take iterators over results instead of iterators over values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a high level check but looks good to me. I'll leave the proper review to @shreyan-gupta .
/// The V1 of StateStoredReceipt. | ||
/// The data is the same as in V0. | ||
/// Outgoing buffer metadata is updated only for versions V1 and higher. | ||
/// The receipts start being stored as V1 after the protocol change that introduced | ||
/// outgoing buffer metadata. Having a separate variant makes it clear whether the | ||
/// outgoing buffer metadata should be updated when a receipt is stored/removed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I think it's pretty neat!
core/primitives/src/trie_key.rs
Outdated
/// Stores `ReceiptGroupsQueueData` for the receipt groups queue | ||
/// which corresponds to the buffered receipts to `receiver_shard`. | ||
pub const OUTGOING_RECEIPT_GROUPS_QUEUE_DATA: u8 = 16; | ||
/// A single item of `ReceiptGroupsQueue`. Values are of type `ReceiptGroup`. | ||
pub const OUTGOING_RECEIPT_GROUPS_QUEUE_ITEM: u8 = 17; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming wise I think it would be more clear to call those BUFFERED_RECEIPT_GROUPS_* here - just to make it clear that this is connected to the BUFFERED_RECEIPT* columns.
I like queue data and queue item even though they don't follow the existing convention of _INDICES
and just the type - I think it's fine as the queue data stores more than just indices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change it to BUFFERED_RECEIPT_GROUPS_QUEUE_DATA/ITEM
. To me "outgoing" is more specific than "buffered". Buffering is a generic action that could be done in many different contexts, "outgoing" clearly refers to the outgoing receipts. But I don't care that much, buffered is also fine.
Ideally it would be "BUFFERED_OUTGOING" x.x
Maybe Shreyan also has an opinion here? I'll wait for his review before renaming it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I agree that outgoing would be better, but now that we already have BUFFERED I'd rather keep things consistent. Yeah, it's just a nit, I'm not too attached to either name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind either prefix, BUFFERED_RECEIPT_GROUP_*
or OUTGOING_RECEIPT_GROUP_*
but I would still like to stick with the *_INDICES
prefix as it gives a clear indication of what to expect there.
Looking at _INDICES
I can expect there to exist a TrieQueue etc. etc. but looking directly at _DATA
and _ITEM
doesn't tell me much. I understand we are storing a bit more than just the indices in our case but the primary purpose is still to track indices. This is me coming from figuring out resharding, sitting and staring at all the trie key names and trying to understand what they mean...
Maybe OUTGOING_RECEIPT_GROUP_INDICES
and OUTGOING_RECEIPT_GROUP
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like INDICES
, it lies about what the data really is. There is a comment pointing to ReceiptGroupsQueueData
, if someone is interested in what exactly the data is, they can quickly go there and take a look.
I'll change it to BUFFERED_RECEIPT_GROUPS_QUEUE_DATA/ITEM
. I think Wac liked it as well, so that's 2 votes for this version.
|
||
// NOTE: NEW_COLUMN = 15 will be the last unique nibble in the trie! | ||
// Consider demultiplexing on 15 and using 2-nibble prefixes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you
core/primitives/src/trie_key.rs
Outdated
TrieKey::OutgoingReceiptGroupsQueueData { .. } => { | ||
col::OUTGOING_RECEIPT_GROUPS_QUEUE_DATA.len() + std::mem::size_of::<u64>() | ||
} | ||
TrieKey::OutgoingReceiptGroupsQueueItem { index, .. } => { | ||
col::OUTGOING_RECEIPT_GROUPS_QUEUE_ITEM.len() | ||
+ std::mem::size_of::<u64>() | ||
+ std::mem::size_of_val(index) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mini nit: Maybe size_of:: ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like size_of_val
more than size_of
, it's impossible to make a mistake, it always calculates the actual size of the field. I only used size_of
for the shard_id
because I wasn't sure how size_of_val
is going to behave with the ShardId
wrapper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry the formatting was messed up, I wanted to suggest size_of ShardId
but that < > made the ShardId disappear.
pub fn default_config() -> Self { | ||
// TODO(bandwidth_scheduler) - put in runtime config | ||
ReceiptGroupsConfig { | ||
size_lower_bound: ByteSize::kb(90), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the lower bound be the same as the diff between two consecutive requests? Is that 100kB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Diff between two consecutive values that can be requested is ~111kB
All groups will be at least as large as lower_bound
, and that should be smaller than the diff between two values. I'm not sure what's the ideal value 🤔 I guess the smaller we make the groups the better the precision of the request. I could reduce the group size to something like 50kB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great overall! Thank you so much!
@@ -90,6 +92,10 @@ impl ShardTries { | |||
TrieUpdate::new(self.get_view_trie_for_shard(shard_uid, state_root)) | |||
} | |||
|
|||
pub fn get_shard_uids(&self) -> &[ShardUId] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are we using this function? Is it not possible to use the shard layout to get the shard_uids?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used by the runtime params estimator, in estimator::apply_action_receipt
.
The EstimatorContext::testbed
function manually creates data for a shard, there's no epoch manager or shard layout.
It would be possible to refactor the estimator to use a shard layout, but the easiest way was to get the list of shards from the list of tries per shard. AFAIU there's only one shard at the moment, so it's not a big issue.
@@ -60,6 +58,8 @@ pub struct OutgoingReceiptBuffer<'parent> { | |||
/// queue items. Based on that, a common push(), pop(), len(), and iter() | |||
/// implementation is provided as trait default implementation. | |||
pub trait TrieQueue { | |||
type Item<'a>: BorshDeserialize + BorshSerialize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a quick todo comment above to remove the 'a
lifetime indicator once we remove Cow
from receipt?
@@ -189,6 +252,8 @@ impl DelayedReceiptQueue { | |||
} | |||
|
|||
impl TrieQueue for DelayedReceiptQueue { | |||
type Item<'a> = ReceiptOrStateStoredReceipt<'a>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For DelayedReceipts, we don't really store any metadata right? Could we change the type of Item to just Receipt
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The delayed receipt queue also uses state stored receipts. (See DelayedReceiptQueueWrapper
). We need to update congestion info every time a receipt is added/removed from the delayed queue, and the stored metadata makes sure that the metadata stays consistent across protocol changes.
/// receipts. | ||
pub struct ReceiptIterator<'a> { | ||
/// Read-only iterator over items stored in a TrieQueue. | ||
pub struct TrieQueueIterator<'a, Queue: TrieQueue> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Love the generalization!
/// Corresponds to receipts stored in the outgoing buffer to this shard. | ||
receiver_shard: ShardId, | ||
/// Persistent data, stored in the trie. | ||
data: ReceiptGroupsQueueDataV0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why aren't we storing ReceiptGroupsQueueData
directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's easier to work with ReceiptGroupsQueueDataV0
, as it's a normal struct with fields that can be accessed. Using the enum would require decomposing it every time or adding access methods :/
The code can be reorganised in the future if we end up needing to use DataV1
.
/// to determine the size and structure of receipts in the outgoing buffer and make | ||
/// bandwidth requests based on them. | ||
#[derive(Debug)] | ||
pub struct ReceiptGroupsQueue { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This struct is used only within this file. Could we remove pub
from all functions and declarations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also used in the receipt sink. When generating bandwidth requests ReceiptSink
fetches the ReceiptsGroupsQueue
for some shard and generates the bandwidth requests based on the groups in the queue.
In the previous iteration of the PR I had a separate struct that served a public interface for outgoing metadata to some shard, and the queue could be hidden in a private module, but Wac complained that it's redundant so I removed it 😅
#12375 (comment)
// Take out the last group from the queue and inspect it. | ||
match self.pop_back(state_update)? { | ||
Some(mut last_group) => { | ||
if self.groups_config.should_start_new_group(&last_group, receipt_size, receipt_gas) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like an overkill to store groups_config
as part of ReceiptGroupsQueue
given it's only being used here. I would consider passing it through update_on_receipt_pushed
function, specially since it doesn't need to be publically exposed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's convenient to have it there. And I think it makes sense - a receipt group queue operates using some group config. It's nice for ergonomics, you don't need to pass the config every time. I like it, I'd prefer to keep it this way.
The config is not saved to the state if that's what you're worried about.
/// Returns empty metadata for protocol versions which don't support metadata. | ||
/// Make sure to pass shard ids for every shard that has receipts in the outgoing buffer, | ||
/// otherwise the metadata for it will be overwritten with empty metadata. | ||
pub fn load( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, question about all the load
functions that we have here...
- Can we structure the code in such a way that the load functions are only called with
BandwidthScheduler
feature enabled? - In which case, can we get rid of all logic that assumes optional loading? Examples are
update_*
functions where we do.entry().or_insert_with()
andReceiptGroupsQueue::load()
returningOption<Self>
? - Initialization on enabling
BandwidthScheduler
feature could be handled by checking if trie key exists and returning default.
It becomes hard to think about all cases where we may or may not be returning/initializing a value, considering we don't have consistent checks for BandwidthScheduler
feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code works the way it does because:
- There are already some outgoing buffers, we need to create matching metadata for them
- During resharding a new shard could suddenly appear and we need to initialize the metadata for it
- We can't really scan the trie for existing outgoing buffers because the
TrieAccess
trait doesn't have an iterator
The easiest way to achieve this was to initialize the metadata when a receipt for it appears.
Can we structure the code in such a way that the load functions are only called with BandwidthScheduler feature enabled?
That's the case right now. OutgoingMetadatas::load
doesn't do anything when the feature is not enabled. And nothing is pushed/popped from the metadata before protocol change.
In which case, can we get rid of all logic that assumes optional loading? Examples are update_* functions where we do .entry().or_insert_with() and ReceiptGroupsQueue::load() returning Option?
During resharding there could be a new shard, the or_insert_with
is meant to handle this case.
Initialization on enabling BandwidthScheduler feature could be handled by checking if trie key exists and returning default.
That is kind of what's happening, OutgoingMetadatas::load
tries to load the metadata, and if the trie doesn't contain the key it creates a new one.
But I'm open to other designs as well, it's a bit convoluted :/
core/primitives/src/trie_key.rs
Outdated
/// Stores `ReceiptGroupsQueueData` for the receipt groups queue | ||
/// which corresponds to the buffered receipts to `receiver_shard`. | ||
pub const OUTGOING_RECEIPT_GROUPS_QUEUE_DATA: u8 = 16; | ||
/// A single item of `ReceiptGroupsQueue`. Values are of type `ReceiptGroup`. | ||
pub const OUTGOING_RECEIPT_GROUPS_QUEUE_ITEM: u8 = 17; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind either prefix, BUFFERED_RECEIPT_GROUP_*
or OUTGOING_RECEIPT_GROUP_*
but I would still like to stick with the *_INDICES
prefix as it gives a clear indication of what to expect there.
Looking at _INDICES
I can expect there to exist a TrieQueue etc. etc. but looking directly at _DATA
and _ITEM
doesn't tell me much. I understand we are storing a bit more than just the indices in our case but the primary purpose is still to track indices. This is me coming from figuring out resharding, sitting and staring at all the trie key names and trying to understand what they mean...
Maybe OUTGOING_RECEIPT_GROUP_INDICES
and OUTGOING_RECEIPT_GROUP
?
Co-authored-by: Shreyan Gupta <shreyan.gupta96@gmail.com>
Co-authored-by: Shreyan Gupta <shreyan.gupta96@gmail.com>
Co-authored-by: Shreyan Gupta <shreyan.gupta96@gmail.com>
Co-authored-by: Shreyan Gupta <shreyan.gupta96@gmail.com>
Co-authored-by: Shreyan Gupta <shreyan.gupta96@gmail.com>
Co-authored-by: Shreyan Gupta <shreyan.gupta96@gmail.com>
Co-authored-by: Shreyan Gupta <shreyan.gupta96@gmail.com>
pub gas_lower_bound: Gas, | ||
/// All receipt groups aim to have gas below this threshold. | ||
pub gas_upper_bound: Gas, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent some more time thinking about the optimal lower and upper bound sizes for receipt groups, and I think I'll:
- Remove the lower bound
- Reduce the upper bound to something like 100kB
This means that a new group will be started if adding a new receipt to the group would cause its size to go above 100kB.
As a result all groups will be below 100kB, except for the groups that contain a single receipt with size above 100kB.
AFAIU having groups like this will produce optimal bandwidth requests - identical to the ones that we would produce if we walked over all of receipts individually.
The diff between two values that can be requested is ~110kB. If all groups were below 100kB we would always produce optimal requests - each consecutive group would either request a value or move to the next value, just like individual receipts.
When there are receipts larger than 100kB we will naturally skip some of the values, but this is identical to what would happen if we operated on individual receipts. The next group smaller than 100kB will either request the same value or move to the next one, just like individual receipts.
A bit handwavy, but I feel that it works. I can make some drawings if needed.
This is a rewrite of #12375, following the redesign that we discussed offline.
Main changes
TrieQueue
instead of a single value with all of the receipt groupsReceiptGroup
now contains both size and gasmax_receipt_size
(as long as the size of a single receipt is at mostmax_receipt_size
...)StateStoredReceipt
to determine if the metadata should be updated. Don't add fields which keep the version of metadata.The code is ready, but untested. I need to write a bunch of tests. You can take a look at the code in the meantime, it'll be easier to apply any changes before all the tests are written.
The PR is meant to be reviewed commit-by-commit.