-
Notifications
You must be signed in to change notification settings - Fork 20.8k
core/filtermaps: two dimensional log filter data structure #30370
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Conversation
c924f0d
to
e6b037f
Compare
9a05680
to
9ad34e5
Compare
a8aa689
to
c04968b
Compare
c592bbf
to
28cdf15
Compare
Do you have some numbers about the performance of the filtermaps? (size, lookup speed, generation speed, etc) |
I measured indexing and unindexing time for the entire chain history and I also saved the log where the index size was 2.350.000 blocks which is the currently proposed default setting:
Database size growth is hard to measure exactly because of compaction (or the lack of it), doing a full indexing after a full unindexing my db size grew 57Gb but it would probably be bigger when done on a freshly synced database. A starting point to do some estimations is that each map consists of 4096 rows which are 64 bytes long on average, stored under consecutive keys so probably a low db overhead per entry. So the entire history log should be about 58.6 Gb plus db overhead while the recommended 2.350.000 blocks (one year) history should be about 12.7 Gb plus db overhead. Also note that this PR removes the old bloombits db which is about 5-6 Gb. The log search performance depends on what we are searching for, I chose a more difficult but pretty common scenario where some of the search values appear very frequently while the overall pattern happens 40 times throughout the chain history. It's a WETH transaction, the filter pattern is for one address and 3 topics.
I did the test for 1M blocks, 10M blocks and the entire chain history, both with and without indexing:
|
7b61867
to
30fd63f
Compare
34a2d4d
to
3b93728
Compare
2fbd945
to
78b5918
Compare
) This PR is #1 of a 3-part series that implements the new log index intended to replace core/bloombits. Replaces #30370 This part implements the new data structure, the log index generator and the search logic. This PR has most of the complexity but it does not affect any existing code yet so maybe it is easier to review separately. FilterMaps data structure explanation: https://gist.github.com/zsfelfoldi/a60795f9da7ae6422f28c7a34e02a07e Log index generator code overview: https://gist.github.com/zsfelfoldi/97105dff0b1a4f5ed557924a24b9b9e7 Search pattern matcher code overview: https://gist.github.com/zsfelfoldi/5981735641c956afb18065e84f8aff34 Note that the possibility of a tree hashing scheme and remote proof protocol are mentioned in the documents above but they are not exactly specified yet. These specs are WIP and will be finalized after the local log indexer/filter code is finalized and merged. --------- Co-authored-by: Felix Lange <fjl@twurst.com>
…ereum#31079) This PR is #1 of a 3-part series that implements the new log index intended to replace core/bloombits. Replaces ethereum#30370 This part implements the new data structure, the log index generator and the search logic. This PR has most of the complexity but it does not affect any existing code yet so maybe it is easier to review separately. FilterMaps data structure explanation: https://gist.github.com/zsfelfoldi/a60795f9da7ae6422f28c7a34e02a07e Log index generator code overview: https://gist.github.com/zsfelfoldi/97105dff0b1a4f5ed557924a24b9b9e7 Search pattern matcher code overview: https://gist.github.com/zsfelfoldi/5981735641c956afb18065e84f8aff34 Note that the possibility of a tree hashing scheme and remote proof protocol are mentioned in the documents above but they are not exactly specified yet. These specs are WIP and will be finalized after the local log indexer/filter code is finalized and merged. --------- Co-authored-by: Felix Lange <fjl@twurst.com>
This PR is #2 of a 3-part series that implements the new log index intended to replace core/bloombits. Based on #31079 Replaces #30370 This part replaces the old bloombits based log search logic in `eth/filters` to use the new `core/filtermaps` logic. FilterMaps data structure explanation: https://gist.github.com/zsfelfoldi/a60795f9da7ae6422f28c7a34e02a07e Log index generator code overview: https://gist.github.com/zsfelfoldi/97105dff0b1a4f5ed557924a24b9b9e7 Search pattern matcher code overview: https://gist.github.com/zsfelfoldi/5981735641c956afb18065e84f8aff34 Note that the possibility of a tree hashing scheme and remote proof protocol are mentioned in the documents above but they are not exactly specified yet. These specs are WIP and will be finalized after the local log indexer/filter code is finalized and merged. --------- Co-authored-by: Felix Lange <fjl@twurst.com>
This PR is #3 of a 3-part series that implements the new log index intended to replace core/bloombits. Based on #31079 and #31080 Replaces #30370 This part removes the old bloombits package and the chain indexer that was only used by bloombits. Deletes the old bloombits database. FilterMaps data structure explanation: https://gist.github.com/zsfelfoldi/a60795f9da7ae6422f28c7a34e02a07e Log index generator code overview: https://gist.github.com/zsfelfoldi/97105dff0b1a4f5ed557924a24b9b9e7 Search pattern matcher code overview: https://gist.github.com/zsfelfoldi/5981735641c956afb18065e84f8aff34 Note that the possibility of a tree hashing scheme and remote proof protocol are mentioned in the documents above but they are not exactly specified yet. These specs are WIP and will be finalized after the local log indexer/filter code is finalized and merged. --------- Co-authored-by: Felix Lange <fjl@twurst.com>
…ereum#31079) This PR is ethereum#1 of a 3-part series that implements the new log index intended to replace core/bloombits. Replaces ethereum#30370 This part implements the new data structure, the log index generator and the search logic. This PR has most of the complexity but it does not affect any existing code yet so maybe it is easier to review separately. FilterMaps data structure explanation: https://gist.github.com/zsfelfoldi/a60795f9da7ae6422f28c7a34e02a07e Log index generator code overview: https://gist.github.com/zsfelfoldi/97105dff0b1a4f5ed557924a24b9b9e7 Search pattern matcher code overview: https://gist.github.com/zsfelfoldi/5981735641c956afb18065e84f8aff34 Note that the possibility of a tree hashing scheme and remote proof protocol are mentioned in the documents above but they are not exactly specified yet. These specs are WIP and will be finalized after the local log indexer/filter code is finalized and merged. --------- Co-authored-by: Felix Lange <fjl@twurst.com>
This PR is ethereum#2 of a 3-part series that implements the new log index intended to replace core/bloombits. Based on ethereum#31079 Replaces ethereum#30370 This part replaces the old bloombits based log search logic in `eth/filters` to use the new `core/filtermaps` logic. FilterMaps data structure explanation: https://gist.github.com/zsfelfoldi/a60795f9da7ae6422f28c7a34e02a07e Log index generator code overview: https://gist.github.com/zsfelfoldi/97105dff0b1a4f5ed557924a24b9b9e7 Search pattern matcher code overview: https://gist.github.com/zsfelfoldi/5981735641c956afb18065e84f8aff34 Note that the possibility of a tree hashing scheme and remote proof protocol are mentioned in the documents above but they are not exactly specified yet. These specs are WIP and will be finalized after the local log indexer/filter code is finalized and merged. --------- Co-authored-by: Felix Lange <fjl@twurst.com>
…m#31081) This PR is ethereum#3 of a 3-part series that implements the new log index intended to replace core/bloombits. Based on ethereum#31079 and ethereum#31080 Replaces ethereum#30370 This part removes the old bloombits package and the chain indexer that was only used by bloombits. Deletes the old bloombits database. FilterMaps data structure explanation: https://gist.github.com/zsfelfoldi/a60795f9da7ae6422f28c7a34e02a07e Log index generator code overview: https://gist.github.com/zsfelfoldi/97105dff0b1a4f5ed557924a24b9b9e7 Search pattern matcher code overview: https://gist.github.com/zsfelfoldi/5981735641c956afb18065e84f8aff34 Note that the possibility of a tree hashing scheme and remote proof protocol are mentioned in the documents above but they are not exactly specified yet. These specs are WIP and will be finalized after the local log indexer/filter code is finalized and merged. --------- Co-authored-by: Felix Lange <fjl@twurst.com>
This PR implements a new log filter data structure that is intended to replace
core/bloombits
.It can also be considered as a pilot project for my EIP-7745 proposal:
https://github.com/zsfelfoldi/EIPs/blob/new-log-filter/EIPS/eip-7745.md
Note that this PR implements the filter structure proposed in the EIP but does not touch consensus. It implements the filter maps but not the tree hash structure. It also does not add pointers to headers and receipts, instead it stores block to log value pointers separately.
Regardless of whether and when EIP-7745 might get accepted, this PR provides immediate value to Geth users interested in logs as it should drastically speed up log search compared to bloombits which is not practically useless because of the overpopulated bloom filters. The EIP is mostly interesting for light client friendliness.