Skip to content

core/filtermaps: two dimensional log filter data structure #30370

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
wants to merge 30 commits into from

Conversation

zsfelfoldi
Copy link
Contributor

This PR implements a new log filter data structure that is intended to replace core/bloombits.
It can also be considered as a pilot project for my EIP-7745 proposal:
https://github.com/zsfelfoldi/EIPs/blob/new-log-filter/EIPS/eip-7745.md
Note that this PR implements the filter structure proposed in the EIP but does not touch consensus. It implements the filter maps but not the tree hash structure. It also does not add pointers to headers and receipts, instead it stores block to log value pointers separately.
Regardless of whether and when EIP-7745 might get accepted, this PR provides immediate value to Geth users interested in logs as it should drastically speed up log search compared to bloombits which is not practically useless because of the overpopulated bloom filters. The EIP is mostly interesting for light client friendliness.

@zsfelfoldi zsfelfoldi changed the title core/filtermaps: two dimensional log filter (WIP) core/filtermaps: two dimensional log filter data structure (WIP) Aug 29, 2024
@zsfelfoldi zsfelfoldi force-pushed the log-filter branch 2 times, most recently from 9a05680 to 9ad34e5 Compare September 15, 2024 23:43
@zsfelfoldi zsfelfoldi force-pushed the log-filter branch 3 times, most recently from a8aa689 to c04968b Compare September 26, 2024 01:49
@zsfelfoldi zsfelfoldi force-pushed the log-filter branch 3 times, most recently from c592bbf to 28cdf15 Compare October 3, 2024 15:18
@zsfelfoldi zsfelfoldi changed the title core/filtermaps: two dimensional log filter data structure (WIP) core/filtermaps: two dimensional log filter data structure Oct 6, 2024
@MariusVanDerWijden
Copy link
Member

Do you have some numbers about the performance of the filtermaps? (size, lookup speed, generation speed, etc)

@zsfelfoldi
Copy link
Contributor Author

Do you have some numbers about the performance of the filtermaps? (size, lookup speed, generation speed, etc)

I measured indexing and unindexing time for the entire chain history and I also saved the log where the index size was 2.350.000 blocks which is the currently proposed default setting:

INFO [10-10|12:15:26.044] Reverse log indexing in progress         maps=51857 history=2,350,607 processed=2,350,000 remaining=18,583,940 elapsed=1h59m54.391s
INFO [10-10|20:18:56.474] Reverse log indexing finished            maps=240,264 history=20,936,958 processed=20,933,940 elapsed=10h3m24.820s
INFO [10-10|21:33:21.991] Log unindexing finished                  maps=1       history=1          removed=20,937,327 elapsed=4m1.752s

Database size growth is hard to measure exactly because of compaction (or the lack of it), doing a full indexing after a full unindexing my db size grew 57Gb but it would probably be bigger when done on a freshly synced database. A starting point to do some estimations is that each map consists of 4096 rows which are 64 bytes long on average, stored under consecutive keys so probably a low db overhead per entry. So the entire history log should be about 58.6 Gb plus db overhead while the recommended 2.350.000 blocks (one year) history should be about 12.7 Gb plus db overhead. Also note that this PR removes the old bloombits db which is about 5-6 Gb.

The log search performance depends on what we are searching for, I chose a more difficult but pretty common scenario where some of the search values appear very frequently while the overall pattern happens 40 times throughout the chain history. It's a WETH transaction, the filter pattern is for one address and 3 topics.

var options = {
   fromBlock: 19924000,
   toBlock: 20924000,
   address: ["0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2"],
   topics: [["0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef"],["0x000000000000000000000000b05c9b5a0ce5d2e12fbd678d7fe34bec7d14414e"],["0x000000000000000000000000f3de3c0d654fda23dad170f0f320a92172509127"]],
};
var filter = web3.eth.filter(options);
filter.get(function(error, log) {
   console.log(JSON.stringify(log));
});

I did the test for 1M blocks, 10M blocks and the entire chain history, both with and without indexing:

Recent 1M blocks:

INFO [10-09|01:38:34.923] Performed indexed log search             begin=19,924,000 end=20,924,000 "true matches"=38 "false positives"=0 elapsed=251.910ms
INFO [10-10|09:56:09.803] Performed unindexed log search           begin=19,924,000 end=20,924,000 matches=38 elapsed=1m8.051s

Recent 10M blocks:

INFO [10-09|01:38:55.740] Performed indexed log search             begin=10,924,000 end=20,924,000 "true matches"=38 "false positives"=0 elapsed=977.532ms
INFO [10-10|10:09:02.963] Performed unindexed log search           begin=10,924,000 end=20,924,000 matches=38 elapsed=4m57.828s

Entire history:

INFO [10-10|21:28:40.395] Performed indexed log search             begin=0 end=20,937,304 "true matches"=40 "false positives"=0 elapsed=1.901s
INFO [10-10|21:45:03.971] Performed unindexed log search           begin=0 end=20,937,354 matches=40 elapsed=6m16.716s

@zsfelfoldi zsfelfoldi force-pushed the log-filter branch 3 times, most recently from 7b61867 to 30fd63f Compare October 24, 2024 14:50
@zsfelfoldi zsfelfoldi force-pushed the log-filter branch 3 times, most recently from 34a2d4d to 3b93728 Compare October 30, 2024 00:39
@zsfelfoldi zsfelfoldi requested a review from s1na as a code owner December 14, 2024 01:30
@zsfelfoldi zsfelfoldi closed this Jan 27, 2025
fjl added a commit that referenced this pull request Mar 13, 2025
)

This PR is #1 of a 3-part series that implements the new log index
intended to replace core/bloombits.
Replaces #30370

This part implements the new data structure, the log index generator and
the search logic. This PR has most of the complexity but it does not
affect any existing code yet so maybe it is easier to review separately.

FilterMaps data structure explanation:
https://gist.github.com/zsfelfoldi/a60795f9da7ae6422f28c7a34e02a07e

Log index generator code overview:
https://gist.github.com/zsfelfoldi/97105dff0b1a4f5ed557924a24b9b9e7

Search pattern matcher code overview:
https://gist.github.com/zsfelfoldi/5981735641c956afb18065e84f8aff34

Note that the possibility of a tree hashing scheme and remote proof
protocol are mentioned in the documents above but they are not exactly
specified yet. These specs are WIP and will be finalized after the local
log indexer/filter code is finalized and merged.

---------

Co-authored-by: Felix Lange <fjl@twurst.com>
GrapeBaBa pushed a commit to optimism-java/shisui that referenced this pull request Mar 16, 2025
…ereum#31079)

This PR is #1 of a 3-part series that implements the new log index
intended to replace core/bloombits.
Replaces ethereum#30370

This part implements the new data structure, the log index generator and
the search logic. This PR has most of the complexity but it does not
affect any existing code yet so maybe it is easier to review separately.

FilterMaps data structure explanation:
https://gist.github.com/zsfelfoldi/a60795f9da7ae6422f28c7a34e02a07e

Log index generator code overview:
https://gist.github.com/zsfelfoldi/97105dff0b1a4f5ed557924a24b9b9e7

Search pattern matcher code overview:
https://gist.github.com/zsfelfoldi/5981735641c956afb18065e84f8aff34

Note that the possibility of a tree hashing scheme and remote proof
protocol are mentioned in the documents above but they are not exactly
specified yet. These specs are WIP and will be finalized after the local
log indexer/filter code is finalized and merged.

---------

Co-authored-by: Felix Lange <fjl@twurst.com>
fjl added a commit that referenced this pull request Mar 17, 2025
This PR is #2 of a 3-part series that implements the new log index
intended to replace core/bloombits.
Based on #31079
Replaces #30370

This part replaces the old bloombits based log search logic in
`eth/filters` to use the new `core/filtermaps` logic.

FilterMaps data structure explanation:
https://gist.github.com/zsfelfoldi/a60795f9da7ae6422f28c7a34e02a07e

Log index generator code overview:
https://gist.github.com/zsfelfoldi/97105dff0b1a4f5ed557924a24b9b9e7

Search pattern matcher code overview:
https://gist.github.com/zsfelfoldi/5981735641c956afb18065e84f8aff34

Note that the possibility of a tree hashing scheme and remote proof
protocol are mentioned in the documents above but they are not exactly
specified yet. These specs are WIP and will be finalized after the local
log indexer/filter code is finalized and merged.

---------

Co-authored-by: Felix Lange <fjl@twurst.com>
fjl added a commit that referenced this pull request Mar 21, 2025
This PR is #3 of a 3-part series that implements the new log index
intended to replace core/bloombits.
Based on #31079 and
#31080
Replaces #30370

This part removes the old bloombits package and the chain indexer that
was only used by bloombits. Deletes the old bloombits database.

FilterMaps data structure explanation:
https://gist.github.com/zsfelfoldi/a60795f9da7ae6422f28c7a34e02a07e

Log index generator code overview:
https://gist.github.com/zsfelfoldi/97105dff0b1a4f5ed557924a24b9b9e7

Search pattern matcher code overview:
https://gist.github.com/zsfelfoldi/5981735641c956afb18065e84f8aff34

Note that the possibility of a tree hashing scheme and remote proof
protocol are mentioned in the documents above but they are not exactly
specified yet. These specs are WIP and will be finalized after the local
log indexer/filter code is finalized and merged.

---------

Co-authored-by: Felix Lange <fjl@twurst.com>
sivaratrisrinivas pushed a commit to sivaratrisrinivas/go-ethereum that referenced this pull request Apr 21, 2025
…ereum#31079)

This PR is ethereum#1 of a 3-part series that implements the new log index
intended to replace core/bloombits.
Replaces ethereum#30370

This part implements the new data structure, the log index generator and
the search logic. This PR has most of the complexity but it does not
affect any existing code yet so maybe it is easier to review separately.

FilterMaps data structure explanation:
https://gist.github.com/zsfelfoldi/a60795f9da7ae6422f28c7a34e02a07e

Log index generator code overview:
https://gist.github.com/zsfelfoldi/97105dff0b1a4f5ed557924a24b9b9e7

Search pattern matcher code overview:
https://gist.github.com/zsfelfoldi/5981735641c956afb18065e84f8aff34

Note that the possibility of a tree hashing scheme and remote proof
protocol are mentioned in the documents above but they are not exactly
specified yet. These specs are WIP and will be finalized after the local
log indexer/filter code is finalized and merged.

---------

Co-authored-by: Felix Lange <fjl@twurst.com>
sivaratrisrinivas pushed a commit to sivaratrisrinivas/go-ethereum that referenced this pull request Apr 21, 2025
This PR is ethereum#2 of a 3-part series that implements the new log index
intended to replace core/bloombits.
Based on ethereum#31079
Replaces ethereum#30370

This part replaces the old bloombits based log search logic in
`eth/filters` to use the new `core/filtermaps` logic.

FilterMaps data structure explanation:
https://gist.github.com/zsfelfoldi/a60795f9da7ae6422f28c7a34e02a07e

Log index generator code overview:
https://gist.github.com/zsfelfoldi/97105dff0b1a4f5ed557924a24b9b9e7

Search pattern matcher code overview:
https://gist.github.com/zsfelfoldi/5981735641c956afb18065e84f8aff34

Note that the possibility of a tree hashing scheme and remote proof
protocol are mentioned in the documents above but they are not exactly
specified yet. These specs are WIP and will be finalized after the local
log indexer/filter code is finalized and merged.

---------

Co-authored-by: Felix Lange <fjl@twurst.com>
sivaratrisrinivas pushed a commit to sivaratrisrinivas/go-ethereum that referenced this pull request Apr 21, 2025
…m#31081)

This PR is ethereum#3 of a 3-part series that implements the new log index
intended to replace core/bloombits.
Based on ethereum#31079 and
ethereum#31080
Replaces ethereum#30370

This part removes the old bloombits package and the chain indexer that
was only used by bloombits. Deletes the old bloombits database.

FilterMaps data structure explanation:
https://gist.github.com/zsfelfoldi/a60795f9da7ae6422f28c7a34e02a07e

Log index generator code overview:
https://gist.github.com/zsfelfoldi/97105dff0b1a4f5ed557924a24b9b9e7

Search pattern matcher code overview:
https://gist.github.com/zsfelfoldi/5981735641c956afb18065e84f8aff34

Note that the possibility of a tree hashing scheme and remote proof
protocol are mentioned in the documents above but they are not exactly
specified yet. These specs are WIP and will be finalized after the local
log indexer/filter code is finalized and merged.

---------

Co-authored-by: Felix Lange <fjl@twurst.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants