DB Manager #263

cam-schultz · 2024-04-18T21:01:05Z

Why this should be merged

Fixes #234 and unblocks #31

The relaying logic of an Application Relayer is largely parallelizable. Specifically, parsing Warp messages, aggregating signatures, and constructing and issuing transactions require no synchronization. The one operation that does is database access. Note that the RelayerDatabse interface assumes that writes are thread safe, but the RelayerDatabase interface makes no assumptions about the consistency of concurrent writes. In its current form, this is an issue because concurrent Application Relayer worker threads may complete blocks out of order.

How this works

To support consistent writes, this PR introduces a database manager layer that prepares and commits writes when it is sure that there are no more pending jobs for a particular block height. It accomplishes this with the following changes:

Subscribers now subscribe to new blocks, rather than relevant Warp logs. After a new block is received, the number of relevant Warp logs for each application relayer are counted, and then dispatched for processing.
Adds a keyManager type that is initialized with the number of expected jobs, and commits a height to the database once all expected jobs have completed. This decouples database write consistency from the order in which messages are processed. (By comparison, the existing implementation writes the current processing message's height to the database; this requires that messages are processed serially in block order)
Adds a DatabaseManager type that periodically writes committed block heights to the associated key in the database. The write interval is configurable.

How this was tested

CI

How is this documented

Updated README

cam-schultz · 2024-04-22T20:04:14Z

relayer/listener.go

-		return err
 	}

 	// Increment the request ID for the next message relay request
 	lstnr.currentRequestID++
-	return nil


This was an existing bug.

geoff-vball

Still going through, but will submit these comments for you to start thinking about

database/manager.go

geoff-vball · 2024-04-23T14:16:31Z

database/manager.go

+	for height := range km.finished {
+		counter, ok := km.queuedHeightsAndMessages[height]
+		if !ok {
+			continue
+		}
+		counter.processedMessages++
+		if counter.processedMessages == counter.totalMessages {
+			km.commitHeight(height)
+			delete(km.queuedHeightsAndMessages, height)
+		}
+	}


So I'm pretty sure we have a race condition in here. Lets say the current highest processed height is 99 and we are currently processing blocks 100 and 101. If 101 finished processing before block 100, we will update the highest processed block to 101, while not having processed 100 yet. We have to check queuedHeightsAndMessages to make sure there is no lower height being processed.

We could do something like... Let's say our highest processed block is N. If the height we're committing is N+1, set the value to that. Otherwise, throw that block height in a min heap. Then when we finally process block N+1, we look in the min heap for N+2, and keep removing values from the heap as long as we have the next block in sequence.

If I'm understanding the code correctly, it does look like the commitHeight function needs to do some buffering to avoid gaps in the committed block height.

A circular buffer could be used in place of a min heap if the difference between the highest processed block and the current committed height is bounded.

That's a great catch. I've implemented the min-heap based approach you suggested.

On a related note, I'm debating how we should handle the case in which a block height never calls commitHeight. In that case, the min-heap would grow unbounded, and no new data would be written to the DB. One approach would be to kill the entire application if the min-heap grows too large, and attempt to re-process the missing block on restart. If the issue persists, the operator would have to intervene by either setting process-missed-blocks to false, or manually updating the DB. One thing I like about this approach is that all blocks up to the height stored in the DB are guaranteed to be processed; only with manual intervention can this be made untrue.

Hmmm... Yeah I wonder what the best approach would be here to handle the case of a missing block. I don't really want to kill the whole application for one missing block for one application.

The approach that is coming to mind is setting some threshold, say 100 blocks, that if the min heap grows bigger than, we just pop the min value from the heap and set that as our most recently processed block, and throw an error for any missed blocks. We could also write the missed blocks to a table somewhere for potential reprocessing later.

database/manager.go

geoff-vball · 2024-04-23T14:29:12Z

database/manager.go

+func (km *keyManager) commitHeight(height uint64) {
+	if height > km.maxCommittedHeight {
+		km.logger.Debug("committing height", zap.Uint64("height", height))
+		km.maxCommittedHeight = height
+	}
+}


commitHeight sounds to me like it should be writing to the database. Maybe heightProcessed?

geoff-vball

A lot of the comments assume we're going to be processing logs async, but I know they're currently still synchronous per source chain.

database/manager.go

relayer/application_relayer.go

geoff-vball · 2024-04-23T15:32:20Z

relayer/listener.go

+		if err != nil {
+			return err
+		}
+		relayer.dbManager.PrepareHeight(relayer.relayerID, height, 0)


I don't think we should be using PrepareHeight here. Maybe a separate helper that force sets the height to latest.

keyManager.prepareHeight immediately commits the height if the number of messages to process is 0. I don't think we should directly expose commitHeight to dbManager callers. I can add separate method with a clearer name that makes this call internally though.

So with the min-heap implemented, prepareheight with 0 messages should only write the height if the current height is N and we commit N+1. We need a separate helper to force-set the height to latest here

commitHeight automatically commits if km.maxCommittedHeight == 0, which is the case on initialization, and should be the case when this function is called, since it is called before the main processing loop in ProcessLogs is initiated.

I agree that this is not at all clear though. I'll add a more specific method that can be called here, and update commitHeight to accept a "force" option.

relayer/listener.go

vms/evm/subscriber.go

README.md

database/manager.go

minghinmatthewlam · 2024-04-23T16:29:15Z

database/manager.go

+// This function should only be called once.
+func (dm *DatabaseManager) Run() {
+	for range time.Tick(dm.interval) {
+		for id, km := range dm.keyManagers {


can you add a comment about what happens if this inner for loop takes longer than the tick interval?

actually doing a quick readup seems like the ticks continue in background, and if the underlying for loop is not completed, then the tick iteration will get dropped and waits for the next. Is that a problem at all for short durations?

The tick interval is on the order of seconds (or 10's of seconds), so the only expected scenario this could occur is if there's network issues reaching the remote database. Even so, this isn't really a problem since a missed write to the database is perfectly safe.

database/manager.go

minghinmatthewlam · 2024-04-23T17:30:33Z

database/manager.go

+				)
+				err = dm.db.Put(id.ID, LatestProcessedBlockKey, []byte(strconv.FormatUint(km.maxCommittedHeight, 10)))
+				if err != nil {
+					dm.logger.Error("Failed to write latest processed block height", zap.Error(err))


do the key managers have some identifier that can be logged here too?

how are you thinking of detecting if one of many key managers are in a stuck state and can't erroring when going to its iterations?

do the key managers have some identifier that can be logged here too?

Added this to the log.

how are you thinking of detecting if one of many key managers are in a stuck state and can't erroring when going to its iterations?

I'm not sure what you mean. In the context of this function, the key managers are read-only so the database manager would always be able to progress to the next iteration.

database/manager.go

config/config.go

database/manager.go

types/types.go

relayer/listener.go

bernard-avalabs

I've left a few concurrency-related comments.

config/config.go

database/manager.go

bernard-avalabs · 2024-04-24T17:43:02Z

database/manager.go

+	for height := range km.finished {
+		counter, ok := km.queuedHeightsAndMessages[height]
+		if !ok {
+			continue
+		}
+		counter.processedMessages++
+		if counter.processedMessages == counter.totalMessages {
+			km.commitHeight(height)
+			delete(km.queuedHeightsAndMessages, height)
+		}
+	}


If I'm understanding the code correctly, it does look like the commitHeight function needs to do some buffering to avoid gaps in the committed block height.

A circular buffer could be used in place of a min heap if the difference between the highest processed block and the current committed height is bounded.

database/manager.go

bernard-avalabs

LGTM

relayer/checkpoint/checkpoint.go

utils/ticker.go

Co-authored-by: Michael Kaplan <55204436+michaelkaplan13@users.noreply.github.com> Signed-off-by: cam-schultz <78878559+cam-schultz@users.noreply.github.com>

…manager

bernard-avalabs

LGTM

cam-schultz added 12 commits April 18, 2024 20:35

wip

e014136

subscribe to blocks, not logs

2d34fa0

add db manager

a0f6971

tests passing

9f2a33b

cleanup

ed1c5eb

db write interval cfg

52c6409

validate db write interval cfg

b9b9594

subscriber composed of ethclient

374363c

fix unit tests

ae7123d

lint

0be9464

describe db manager

02a8b9e

Merge branch 'main' into db-manager

fa28796

cam-schultz marked this pull request as ready for review April 22, 2024 19:49

cam-schultz requested review from michaelkaplan13, minghinmatthewlam, geoff-vball and bernard-avalabs as code owners April 22, 2024 19:49

unconditionally increment requestid

3ae8313

cam-schultz commented Apr 22, 2024

View reviewed changes

geoff-vball reviewed Apr 23, 2024

View reviewed changes

minghinmatthewlam reviewed Apr 23, 2024

View reviewed changes

README.md Show resolved Hide resolved

minghinmatthewlam reviewed Apr 23, 2024

View reviewed changes

michaelkaplan13 reviewed Apr 23, 2024

View reviewed changes

config/config.go Outdated Show resolved Hide resolved

database/manager.go Outdated Show resolved Hide resolved

types/types.go Outdated Show resolved Hide resolved

relayer/listener.go Outdated Show resolved Hide resolved

bernard-avalabs reviewed Apr 24, 2024

View reviewed changes

cam-schultz added 3 commits April 24, 2024 21:01

commit blocks in order

35ad194

pr feedback

9ea74c2

simplify locks

fcb9903

michaelkaplan13 reviewed Apr 25, 2024

View reviewed changes

database/manager.go Outdated Show resolved Hide resolved

michaelkaplan13 reviewed Apr 25, 2024

View reviewed changes

database/manager.go Outdated Show resolved Hide resolved

cam-schultz added 2 commits April 30, 2024 21:29

write directly to headers chan

3eb310f

closed chan is not an error

74c8526

cam-schultz dismissed geoff-vball’s stale review via 74c8526 April 30, 2024 21:37

cam-schultz requested a review from geoff-vball April 30, 2024 21:37

cleanup subscriber interaction

e716847

geoff-vball previously approved these changes May 1, 2024

View reviewed changes

Merge branch 'main' into db-manager

0c25956

bernard-avalabs previously approved these changes May 7, 2024

View reviewed changes

michaelkaplan13 reviewed May 10, 2024

View reviewed changes

relayer/checkpoint/checkpoint.go Outdated Show resolved Hide resolved

relayer/checkpoint/checkpoint.go Outdated Show resolved Hide resolved

relayer/checkpoint/checkpoint.go Show resolved Hide resolved

utils/ticker.go Outdated Show resolved Hide resolved

Update relayer/checkpoint/checkpoint.go

3b3cf0e

Co-authored-by: Michael Kaplan <55204436+michaelkaplan13@users.noreply.github.com> Signed-off-by: cam-schultz <78878559+cam-schultz@users.noreply.github.com>

cam-schultz dismissed stale reviews from bernard-avalabs and geoff-vball via 3b3cf0e May 13, 2024 15:07

cam-schultz and others added 4 commits May 13, 2024 11:08

Update utils/ticker.go

7048b94

Co-authored-by: Michael Kaplan <55204436+michaelkaplan13@users.noreply.github.com> Signed-off-by: cam-schultz <78878559+cam-schultz@users.noreply.github.com>

comments and cleanup

d453c14

Merge branch 'db-manager' of github.com:ava-labs/awm-relayer into db-…

a566dd7

…manager

Merge branch 'main' into db-manager

9055a5e

cam-schultz requested review from bernard-avalabs, geoff-vball and michaelkaplan13 May 13, 2024 15:15

michaelkaplan13 previously approved these changes May 13, 2024

View reviewed changes

geoff-vball previously approved these changes May 13, 2024

View reviewed changes

Merge branch 'main' into db-manager

f34e84e

cam-schultz dismissed stale reviews from geoff-vball and michaelkaplan13 via f34e84e May 14, 2024 15:23

cam-schultz requested review from michaelkaplan13 and geoff-vball May 14, 2024 15:29

bernard-avalabs approved these changes May 14, 2024

View reviewed changes

geoff-vball approved these changes May 14, 2024

View reviewed changes

cam-schultz merged commit dd44ea9 into main May 14, 2024
7 checks passed

cam-schultz deleted the db-manager branch May 14, 2024 17:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DB Manager #263

DB Manager #263

cam-schultz commented Apr 18, 2024 •

edited

Loading

cam-schultz Apr 22, 2024

geoff-vball left a comment

geoff-vball Apr 23, 2024

bernard-avalabs Apr 24, 2024 •

edited

Loading

cam-schultz Apr 24, 2024

geoff-vball Apr 25, 2024

geoff-vball Apr 23, 2024

geoff-vball left a comment

geoff-vball Apr 23, 2024

cam-schultz Apr 24, 2024

geoff-vball Apr 25, 2024

cam-schultz Apr 25, 2024

minghinmatthewlam Apr 23, 2024

minghinmatthewlam Apr 23, 2024

cam-schultz Apr 24, 2024

minghinmatthewlam Apr 23, 2024

minghinmatthewlam Apr 23, 2024

cam-schultz Apr 24, 2024

bernard-avalabs left a comment

bernard-avalabs Apr 24, 2024 •

edited

Loading

bernard-avalabs left a comment

bernard-avalabs left a comment

DB Manager #263

DB Manager #263

Conversation

cam-schultz commented Apr 18, 2024 • edited Loading

Why this should be merged

How this works

How this was tested

How is this documented

Choose a reason for hiding this comment

geoff-vball left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bernard-avalabs Apr 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

geoff-vball left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bernard-avalabs left a comment

Choose a reason for hiding this comment

bernard-avalabs Apr 24, 2024 • edited Loading

Choose a reason for hiding this comment

bernard-avalabs left a comment

Choose a reason for hiding this comment

bernard-avalabs left a comment

Choose a reason for hiding this comment

cam-schultz commented Apr 18, 2024 •

edited

Loading

bernard-avalabs Apr 24, 2024 •

edited

Loading

bernard-avalabs Apr 24, 2024 •

edited

Loading