-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
fix(manager): gossip any pending blocks not gossiped before (in case of restart or crash) #1045
fix(manager): gossip any pending blocks not gossiped before (in case of restart or crash) #1045
Conversation
block/produce.go
Outdated
@@ -120,6 +120,12 @@ func (m *Manager) produceApplyGossip(ctx context.Context, allowEmpty bool, nextP | |||
return nil, nil, fmt.Errorf("gossip block: %w", err) | |||
} | |||
|
|||
m.State.LastGossipedHeight = block.Header.Height |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why updating it here and not in the gossipBlock
function? that way seems much less error-prone to have one place (cause now for example in the node recovery function for gossiping were not updating it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved inside function
block/manager.go
Outdated
@@ -328,3 +330,22 @@ func (m *Manager) setDA(daconfig string, dalcKV store.KV, logger log.Logger) err | |||
m.Retriever = retriever | |||
return nil | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another edge case I see here is the following: I was a sequencer previously, and than I wasn't (rotation. but continued being a full node), and than I am again. In this case, my store LastGossipedHeight
wasn't updated when I was a full node (or at least I don't see it). and than I need to get stuck on gossiping potentially a big amount.
I think we need some sort of a sane skew after which we avoid gossiping (i.e if I'm 1M blocks away from the latest state, than probably it doesn't make sense).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i added an update for LastGossipedHeight inside UpdateTargetHeight, this will cause full-nodes also update LastGossipedHeight when retrieving form p2p or DA. this way, when rotating the value will be updated i think
block/state.go
Outdated
@@ -32,6 +32,10 @@ func (m *Manager) LoadStateOnInit(store store.Store, genesis *tmtypes.GenesisDoc | |||
} | |||
|
|||
m.State = s | |||
|
|||
if m.State.LastGossipedHeight == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worth adding a comment here explainig the reasoning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment added
block/manager.go
Outdated
@@ -285,6 +287,15 @@ func (m *Manager) UpdateTargetHeight(h uint64) { | |||
break | |||
} | |||
} | |||
|
|||
// updating LastGossipedHeight to avoid gossiping long queue of blocks when sequencer rotation | |||
if m.State.LastGossipedHeight < h { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 issues I have with that:
- the function is called update target height, but it actually also updates last gossiped height
- the param is called last gossiped height for a full node, but it's tracking actually a potential sequencer only metric even if you're not a sequencer which may never be relevant for you. so maybe we need to make it a bit more make sense cause it feels hacky now (as if one would see that he would think it's the last height the full node got from gossiping but it's not)
- we call the store save state on each iteration vs just updating it once. not sure why we need to iterate vs just saving h? it's just overriding in a loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in order to avoid these issues, i refactored the pr and i implemented a new and simpler approach.
the new approach relies the block is added to blocksync after is produced but not on gossip (this is not removed but it will be used only by full-nodes). this way if the block is not gossiped because a panic, it will be available through blocksync, and full-nodes will be able to sync via p2p anyway
Co-authored-by: Omri <omritoptix@gmail.com>
fb1bb8c
to
33a5d63
Compare
block/block.go
Outdated
@@ -48,6 +50,16 @@ func (m *Manager) applyBlock(block *types.Block, commit *types.Commit, blockMeta | |||
return fmt.Errorf("save block: %w", err) | |||
} | |||
|
|||
gossipedBlock := p2p.BlockData{Block: *block, Commit: *commit} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for cleanliness I'd suggest moving this code part to a saveP2PBlock
method or some other name which makes sense an add a godoc to this function.
also we why use context.TODO()
? s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to new func
@@ -519,6 +519,11 @@ func (c *Client) blockGossipReceived(ctx context.Context, block []byte) { | |||
c.logger.Error("Publishing event.", "err", err) | |||
} | |||
if c.conf.BlockSyncEnabled { | |||
_, err := c.store.LoadBlockCid(gossipedBlock.Block.Header.Height) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this for optimization purposes?
if so I see we're missing the AddBlockRecieved
method in that case - is that fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this is just to avoid storing the block twice if already stored.
PR Standards
Opening a pull request should be able to meet the following requirements
--
PR naming convention: https://hackmd.io/@nZpxHZ0CT7O5ngTp0TP9mg/HJP_jrm7A
In case the node crashes there maybe blocks that are not gossiped, but applied. In this case, blocks will never be gossiped (and neither added to blocksync store) and therefore not retrievable from P2P.
This pr fixes it by adding blocks to blocksync right after are produced and stored to db, this way blocks will be aways available in p2p, even if not gossiped.
Close #1044
<-- Briefly describe the content of this pull request -->
For Author:
godoc
commentsFor Reviewer:
After reviewer approval: