You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Context and scope
Currently, we write the checkpointed block to the database after every processed Warp message. This approach has unbounded write frequency in highly concurrent loads, especially when considering future changes such as #225, which may result in many relayer instances sharing a single database.
Discussion and alternatives Overview
In reality, writing on every processed Warp message is overkill, since the catch-up mechanism is intelligent enough to not attempt to re-broadcast already delivered Warp messages (at least for Teleporter; future message protocol integrations should also take this approach - see #233). Instead, we can get away with caching the checkpoint in memory, and writing to the database periodically. This period should be settable from the configuration, so that deployments can scale the figure with the number of relayer instances to prevent a write queue.
Implementation
With #240, database access is handled by "application relayers" instead of the Relayer instances that listens to a particular chain. We can extend the application relayer logic to include a database write coordinator, since we only need to ensure synchronization at the key level, which are uniquely associated with a single application relayer. It is assumed that the database implementation is able to handle concurrent writes to different keys.
Open Questions
In order to implement this in a thread safe way (see #31), we need to reliably track when the relayer process is done processing a block across all threads. This is not an issue when processing messages in serial, since we can guarantee ordering. However, with messages processed concurrently this ordering guarantee no longer holds.
We do however know that messages coming from the subscription are ordered, so we need a way to mark messages as pending or complete. We should only write a block height to the database when all messages from that block are marked complete.
Relax Ordering Requirement
An alternative approach would be to relax the ordering requirement by having the subscriber count the number of Warp messages in a block, and having the database coordinator routine track the number of in-progress messages. By accounting for all in progress messages and their blocks, the coordinator will be able to determine the highest safe block height to checkpoint, without requiring that messages be processed in block order.
To do this, the subscriber will need to be updated to subscribe to blocks, rather than individual logs so that it is able to count the messages in a block before initiating message processing. We'll want to update the eth_subscribe call to report newHeads and then apply a filter, rather than subscribing to filteredlogs.
For the similar reason of needing to know the range of in-progress messages, the catch-up mechanism will also need to be adjusted to be aware of the full block range it will process ahead of initiating message processing. Otherwise, the coordinator may erroneously mark a live incoming message as safe to checkpoint, while there are still historical messages that have yet to be processed. To make it concrete, suppose the current head is block 1000, and the catch up mechanism is currently at block 500. A live message is processed at block 1001. If the coordinator does not have the catch up mechanism ending block height, then it may erroneously checkpoint block 1001, causing blocks 500-1000 to potentially be missed in the event of an application exit.
Finally, one advantage of processing by block rather than by message is that the subscriber is aware of all incoming blocks regardless or if they contain relevant Warp messages, giving us the ability to regularly checkpoint block heights independent of Warp messages actually being processed.
The text was updated successfully, but these errors were encountered:
Context and scope
Currently, we write the checkpointed block to the database after every processed Warp message. This approach has unbounded write frequency in highly concurrent loads, especially when considering future changes such as #225, which may result in many relayer instances sharing a single database.
Discussion and alternatives
Overview
In reality, writing on every processed Warp message is overkill, since the catch-up mechanism is intelligent enough to not attempt to re-broadcast already delivered Warp messages (at least for Teleporter; future message protocol integrations should also take this approach - see #233). Instead, we can get away with caching the checkpoint in memory, and writing to the database periodically. This period should be settable from the configuration, so that deployments can scale the figure with the number of relayer instances to prevent a write queue.
Implementation
With #240, database access is handled by "application relayers" instead of the
Relayer
instances that listens to a particular chain. We can extend the application relayer logic to include a database write coordinator, since we only need to ensure synchronization at the key level, which are uniquely associated with a single application relayer. It is assumed that the database implementation is able to handle concurrent writes to different keys.Open Questions
In order to implement this in a thread safe way (see #31), we need to reliably track when the relayer process is done processing a block across all threads. This is not an issue when processing messages in serial, since we can guarantee ordering. However, with messages processed concurrently this ordering guarantee no longer holds.
We do however know that messages coming from the subscription are ordered, so we need a way to mark messages as pending or complete. We should only write a block height to the database when all messages from that block are marked complete.
Relax Ordering Requirement
An alternative approach would be to relax the ordering requirement by having the subscriber count the number of Warp messages in a block, and having the database coordinator routine track the number of in-progress messages. By accounting for all in progress messages and their blocks, the coordinator will be able to determine the highest safe block height to checkpoint, without requiring that messages be processed in block order.
To do this, the subscriber will need to be updated to subscribe to blocks, rather than individual logs so that it is able to count the messages in a block before initiating message processing. We'll want to update the
eth_subscribe
call to reportnewHeads
and then apply a filter, rather than subscribing to filteredlogs
.For the similar reason of needing to know the range of in-progress messages, the catch-up mechanism will also need to be adjusted to be aware of the full block range it will process ahead of initiating message processing. Otherwise, the coordinator may erroneously mark a live incoming message as safe to checkpoint, while there are still historical messages that have yet to be processed. To make it concrete, suppose the current head is block 1000, and the catch up mechanism is currently at block 500. A live message is processed at block 1001. If the coordinator does not have the catch up mechanism ending block height, then it may erroneously checkpoint block 1001, causing blocks 500-1000 to potentially be missed in the event of an application exit.
Finally, one advantage of processing by block rather than by message is that the subscriber is aware of all incoming blocks regardless or if they contain relevant Warp messages, giving us the ability to regularly checkpoint block heights independent of Warp messages actually being processed.
The text was updated successfully, but these errors were encountered: