Add periodic database write routine #234

cam-schultz · 2024-03-14T20:51:37Z

Context and scope
Currently, we write the checkpointed block to the database after every processed Warp message. This approach has unbounded write frequency in highly concurrent loads, especially when considering future changes such as #225, which may result in many relayer instances sharing a single database.

Discussion and alternatives
Overview
In reality, writing on every processed Warp message is overkill, since the catch-up mechanism is intelligent enough to not attempt to re-broadcast already delivered Warp messages (at least for Teleporter; future message protocol integrations should also take this approach - see #233). Instead, we can get away with caching the checkpoint in memory, and writing to the database periodically. This period should be settable from the configuration, so that deployments can scale the figure with the number of relayer instances to prevent a write queue.

Implementation
With #240, database access is handled by "application relayers" instead of the Relayer instances that listens to a particular chain. We can extend the application relayer logic to include a database write coordinator, since we only need to ensure synchronization at the key level, which are uniquely associated with a single application relayer. It is assumed that the database implementation is able to handle concurrent writes to different keys.

Open Questions
In order to implement this in a thread safe way (see #31), we need to reliably track when the relayer process is done processing a block across all threads. This is not an issue when processing messages in serial, since we can guarantee ordering. However, with messages processed concurrently this ordering guarantee no longer holds.

We do however know that messages coming from the subscription are ordered, so we need a way to mark messages as pending or complete. We should only write a block height to the database when all messages from that block are marked complete.

Relax Ordering Requirement

An alternative approach would be to relax the ordering requirement by having the subscriber count the number of Warp messages in a block, and having the database coordinator routine track the number of in-progress messages. By accounting for all in progress messages and their blocks, the coordinator will be able to determine the highest safe block height to checkpoint, without requiring that messages be processed in block order.

To do this, the subscriber will need to be updated to subscribe to blocks, rather than individual logs so that it is able to count the messages in a block before initiating message processing. We'll want to update the eth_subscribe call to report newHeads and then apply a filter, rather than subscribing to filteredlogs.

For the similar reason of needing to know the range of in-progress messages, the catch-up mechanism will also need to be adjusted to be aware of the full block range it will process ahead of initiating message processing. Otherwise, the coordinator may erroneously mark a live incoming message as safe to checkpoint, while there are still historical messages that have yet to be processed. To make it concrete, suppose the current head is block 1000, and the catch up mechanism is currently at block 500. A live message is processed at block 1001. If the coordinator does not have the catch up mechanism ending block height, then it may erroneously checkpoint block 1001, causing blocks 500-1000 to potentially be missed in the event of an application exit.

Finally, one advantage of processing by block rather than by message is that the subscriber is aware of all incoming blocks regardless or if they contain relevant Warp messages, giving us the ability to regularly checkpoint block heights independent of Warp messages actually being processed.

The text was updated successfully, but these errors were encountered:

cam-schultz added the enhancement New feature or request label Mar 14, 2024

github-project-automation bot added this to Platform Engineering Group Mar 14, 2024

github-project-automation bot moved this to Backlog 🗄️ in Platform Engineering Group Mar 14, 2024

cam-schultz added this to the Post-Durango fast follows milestone Mar 19, 2024

This was referenced Mar 19, 2024

Buffer incoming live messages while catch up is active #241

Closed

Relayer throughput scaling #242

Open

Support shared DB access #243

Merged

cam-schultz moved this from Backlog 🗄️ to In Progress 🏗 in Platform Engineering Group Apr 16, 2024

cam-schultz moved this from In Progress 🏗 to Ready ⏳ in Platform Engineering Group Apr 16, 2024

cam-schultz self-assigned this Apr 16, 2024

cam-schultz moved this from Ready ⏳ to In Progress 🏗 in Platform Engineering Group Apr 16, 2024

cam-schultz mentioned this issue Apr 18, 2024

DB Manager #263

Merged

cam-schultz moved this from In Progress 🏗 to In Review 👀 in Platform Engineering Group Apr 22, 2024

cam-schultz closed this as completed in #263 May 14, 2024

github-project-automation bot moved this from In Review 👀 to Done ✅ in Platform Engineering Group May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add periodic database write routine #234

Add periodic database write routine #234

cam-schultz commented Mar 14, 2024 •

edited

Loading

Add periodic database write routine #234

Add periodic database write routine #234

Comments

cam-schultz commented Mar 14, 2024 • edited Loading

cam-schultz commented Mar 14, 2024 •

edited

Loading