Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

pg,abci: unblock notice subscribers on DB shutdown #1021

Merged
merged 1 commit into from
Sep 27, 2024

Conversation

jchappelow
Copy link
Member

This resolves a possible hang in FinalizeBlock if the DB shuts down / dies while waiting for the completion of the notice stream at wg.Wait(). When the pg.DB instance closes up in response to the replication connection terminating, we also make sure to close the subscriber channels to unblock the receivers. To ensure the receivers handle this as an error rather than assuming all notices have been received successfully, an empty string is first sent on the channel. Since all notice messages have a special prefix (pgtx:), this is easily recognized as an exception.

@jchappelow jchappelow marked this pull request as draft September 25, 2024 16:04
@jchappelow jchappelow marked this pull request as ready for review September 25, 2024 16:09
Comment on lines +595 to +630
// will still be deterministic so nbd to not halt here
a.log.Errorf("failed to parse notice (%.20s...): %v", log, err)
continue // since txid is invalid and won't match any result.TxHash
Copy link
Member Author

@jchappelow jchappelow Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls double check me on this continue @brennanjl . I tend to agree it not necessary to halt, but I don't think we should attempt to use the txid (which is empty or generally invalid) as a map key below, just continue to receive in the next iteration of the loop

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this seems correct.

Comment on lines +595 to +630
// will still be deterministic so nbd to not halt here
a.log.Errorf("failed to parse notice (%.20s...): %v", log, err)
continue // since txid is invalid and won't match any result.TxHash
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this seems correct.

wg.Wait()
// wait for all logs to be received, or a premature shutdown
select {
case <-ctx.Done():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to worry about this not being supported in comet v0.38? I don't think so because we close the notice subscribers when the database shuts down, but just want to double check that you see it the same

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it's ok. With 0.38 we only have one way to break out of here, and that's the notice signaling that the DB dies. With 1.0 we can also get a signal from cometbft, but that's just going to be a bonus.

brennanjl
brennanjl previously approved these changes Sep 27, 2024
Copy link
Collaborator

@brennanjl brennanjl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm but seems to be a merge conflict

@jchappelow
Copy link
Member Author

gofmt'd

@jchappelow
Copy link
Member Author

Resolved conflicts.

@brennanjl brennanjl merged commit c4038d9 into kwilteam:main Sep 27, 2024
2 checks passed
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants