-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Checksum cache #17
Merged
Merged
Checksum cache #17
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
stevensJourney
previously approved these changes
Jun 13, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This is relevant when there are more concurrent requests than the cache size.
stevensJourney
approved these changes
Jun 13, 2024
# for free
to join this conversation on GitHub.
Already have an account?
# to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Supersedes #12.
Checksum cache
Checksum calculation is currently the operation adding the most load on the storage MongoDB database.
This uses a LRU cache for bucket checksums (bucket, checkpoint). This value never changes per sync rules instance, so we use a cache per sync rules instance.
When a cached checksum is not available for a specific checkpoint, we look for any earlier checkpoints which does have a cached checksum available, and fetch a partial checksum from that point. For large buckets, this should be much faster than computing the entire checksum from scratch.
The cache should help for incremental updates within an active connection, as well as for re-using checksums over multiple connections.
Filtered data requests
When fetching data for a new checkpoint, we now only include buckets for which the checksum changed. Combined with the checksum cache, this should further reduce the load on the database by a small amount.
This has no effect on initial sync requests or the first request for a new connection yet. That is an optimization that could be included later.
Empty buckets
This now also returns zero checksums for empty buckets to the client, instead of omitting the bucket. This does not affect synced data on the client, but helps with diagnosing sync issues.
See powersync-ja/powersync-js#209 for the diagnostics app change to display these buckets.
Performance Impact
This specifically helps for cases where a large set of data is synced to clients (e.g. 100k+ rows), but the incremental updates are small. It reduces both the latency of getting these updates to the client, and the load on the MongoDB storage database.
Test case 1
Test case:
Before:
After:
This is the "best case scenario" for the cache, since all users share a single large bucket.
Test case 2
Test case: Same as the above, but additionally add 400 small unique buckets per user.
Before:
Results (after):
Test case 3
Test case: Same as the above, but reduce cache size to 3000 (less than the 4000 combined buckets for all connected users).
Before:
Results (after):
This is the "worst case scenario" for the cache - many small buckets, unique to each user. The cache does not help much for these buckets, and could evict the global bucket checksum cache in some cases.
We don't expect to hit this much in practice, since the default cache size (100k cached checksums) is enough to handle the limit of 1000 buckets for 100 concurrent users.