-
Notifications
You must be signed in to change notification settings - Fork 281
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
feat: add migration checker cmd #1114
base: develop
Are you sure you want to change the base?
Conversation
WalkthroughThe changes introduce a new command-line tool for validating the equality of two databases—a Merkle Patricia Tree (MPT) and a Zero-Knowledge (ZK) database. The tool sets up command-line flags, opens levelDB connections, and uses concurrent processes to compare trie structures, accounts, and storage. Additional helper functions manage error handling and data loading in parallel. In the trie package, a new Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Main
participant MPT_DB as MPT Database
participant ZK_DB as ZK Database
participant Checker
User->>Main: Run migration-checker with flags
Main->>MPT_DB: Open database connection
Main->>ZK_DB: Open database connection
Main->>Checker: Initiate trie equality check
Checker->>Checker: Spawn concurrent goroutines for checks
Checker->>MPT_DB: Load leaf nodes (MPT)
Checker->>ZK_DB: Load leaf nodes (ZK)
Checker->>Checker: Compare account and storage equality
Checker-->>Main: Return results/errors
Main-->>User: Display comparison outcome
sequenceDiagram
participant Caller
participant ZkTrie
Caller->>ZkTrie: Call CountLeaves(cb, parallel)
alt Node is a leaf
ZkTrie-->>Caller: Invoke cb(key, value) and return count 1
else Node is not a leaf
alt Parallel mode & depth < limit
ZkTrie->>ZkTrie: Spawn goroutines for left and right children
else
ZkTrie->>ZkTrie: Count children sequentially
end
end
ZkTrie-->>Caller: Return total leaf count
Poem
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
a9d9968
to
1651970
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (7)
cmd/migration-checker/main.go (7)
21-22
: Consider grouping related global variables into a config struct.
Storing global variables likeaccountsDone
andtrieCheckers
is convenient, but it can introduce shared state across the entire application. Wrapping them in a configuration or context structure may improve clarity, testability, and modularity.
29-58
: Evaluate replacing panics with error returns or exit codes.
While usingpanicOnError
is acceptable in a development tool, panics might complicate automated scripts or higher-level orchestration. Converting these to user-facing error messages with proper exit codes could yield a cleaner CLI experience.
46-49
: Check concurrency capacity alignment with real-world usage.
Usingruntime.GOMAXPROCS(0)*4
for channel capacity is a good start, but it may need tuning for very large databases. Monitoring resource utilization and adjusting this factor might improve throughput or prevent overconsumption of system resources.
81-83
: Add clarity before panicking on leaf count mismatch.
While it’s correct to panic if leaf counts differ, consider logging the specific roots or DB paths for easier troubleshooting.
105-107
: Remove unnecessary byte slice conversion.
Static analysis indicates an unnecessary conversion to[]byte(...)
formptKey
andpreimageKey
, which are already[]byte
.- panic(fmt.Sprintf("%s key %s (preimage %s) not found in mpt", label, hex.EncodeToString([]byte(mptKey)), hex.EncodeToString([]byte(preimageKey)))) + panic(fmt.Sprintf("%s key %s (preimage %s) not found in mpt", label, hex.EncodeToString(mptKey), hex.EncodeToString(preimageKey)))🧰 Tools
🪛 GitHub Check: check
[failure] 107-107:
unnecessary conversion (unconvert)
134-146
: Handle goroutine panics more gracefully.
Capturing panics in the goroutine and then callingos.Exit(1)
is valid, but consider using a more structured error-reporting approach. This might help gather partial results or trigger retries.
217-245
: Ensure consistent error handling inloadZkTrie
.
panic
is used when a preimage is missing. If missing preimages are a frequent partial data scenario, consider logging or returning an error to allow partial inspections.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
cmd/migration-checker/main.go
(1 hunks)trie/secure_trie.go
(1 hunks)trie/tracer.go
(4 hunks)trie/zk_trie.go
(1 hunks)
🧰 Additional context used
🪛 GitHub Check: check
cmd/migration-checker/main.go
[failure] 107-107:
unnecessary conversion (unconvert)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: test
🔇 Additional comments (4)
cmd/migration-checker/main.go (2)
69-112
: Verify potential mismatch in leaf counts.
The code darts right into apanic
if there is a mismatch between MPT leaves and ZK leaves. This strict approach is valid for a verification tool, but confirm whether partial or incremental mismatch logs might be more appropriate for diagnosing large data sets.🧰 Tools
🪛 GitHub Check: check
[failure] 107-107:
unnecessary conversion (unconvert)
167-215
: Watch for race conditions on the shared map.
This code uses a mutex for the sharedmptLeafMap
. However, the brief locking aroundif _, ok := mptLeafMap[...]
andmptLeafMap[...] = ...
might leave a small gap if multiple goroutines detect the same key concurrently before any break/return. Confirm whether this is acceptable or if you require a more robust check-and-insert process.trie/tracer.go (1)
63-66
: Nil receiver checks look correct.
These addedif t == nil { return }
conditions prevent panics whentracer
is intentionally set tonil
(e.g., viaNewSecureNoTracer
). This change maintains robust handling if the tracer is disabled.Also applies to: 74-77, 89-92, 102-105, 114-116, 130-133
trie/secure_trie.go (1)
68-76
:NewSecureNoTracer
function is well-structured.
The approach of callingNewSecure
and then settingtrie.tracer = nil
is straightforward and aligns with the new nil-check logic intracer.go
. Consider documenting any limitations when tracing is disabled (e.g., no record of deleted nodes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (1)
trie/zk_trie.go (1)
242-248
: 🛠️ Refactor suggestionImprove error handling and add documentation.
The method needs better error handling and documentation.
Additionally, fix the pipeline failures by adding the missing
verifyNodeHashes
argument:-func (t *ZkTrie) CountLeaves(cb func(key, value []byte), parallel, verifyNodeHashes bool) uint64 { +func (t *ZkTrie) CountLeaves(cb func(key, value []byte), parallel, verifyNodeHashes bool) (uint64, error) { root, err := t.ZkTrie.Tree().Root() if err != nil { - panic("CountLeaves cannot get root") + return 0, fmt.Errorf("failed to get root: %w", err) } - return t.countLeaves(root, cb, 0, parallel, verifyNodeHashes) + return t.countLeaves(root, cb, 0, parallel, verifyNodeHashes), nil }
🧹 Nitpick comments (1)
cmd/migration-checker/main.go (1)
39-42
: Consider making levelDB buffer sizes configurable.The hardcoded buffer sizes (1024, 128) for levelDB might not be optimal for all scenarios. Consider making these configurable via command-line flags.
var ( mptDbPath = flag.String("mpt-db", "", "path to the MPT node DB") zkDbPath = flag.String("zk-db", "", "path to the ZK node DB") mptRoot = flag.String("mpt-root", "", "root hash of the MPT node") zkRoot = flag.String("zk-root", "", "root hash of the ZK node") paranoid = flag.Bool("paranoid", false, "verifies all node contents against their expected hash") + cacheSize = flag.Int("cache-size", 1024, "size of the data cache in MB") + handles = flag.Int("handles", 128, "number of file handles") )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
cmd/migration-checker/main.go
(1 hunks)trie/zk_trie.go
(1 hunks)
🧰 Additional context used
🪛 GitHub Check: test
trie/zk_trie.go
[failure] 279-279:
not enough arguments in call to leftT.countLeaves
[failure] 282-282:
not enough arguments in call to rightT.countLeaves
[failure] 286-286:
not enough arguments in call to t.countLeaves
🪛 GitHub Check: check
trie/zk_trie.go
[failure] 279-279:
not enough arguments in call to leftT.countLeaves
[failure] 282-282:
not enough arguments in call to rightT.countLeaves
[failure] 286-286:
not enough arguments in call to t.countLeaves
[failure] 279-279:
not enough arguments in call to leftT.countLeaves
[failure] 282-282:
not enough arguments in call to rightT.countLeaves
[failure] 286-286:
not enough arguments in call to t.countLeaves
[failure] 279-279:
not enough arguments in call to leftT.countLeaves
cmd/migration-checker/main.go
[failure] 18-18:
could not import github.com/scroll-tech/go-ethereum/trie (-: # github.com/scroll-tech/go-ethereum/trie
🪛 GitHub Check: build-mock-ccc-geth
trie/zk_trie.go
[failure] 279-279:
not enough arguments in call to leftT.countLeaves
[failure] 282-282:
not enough arguments in call to rightT.countLeaves
[failure] 286-286:
not enough arguments in call to t.countLeaves
🪛 GitHub Actions: CI
trie/zk_trie.go
[error] 279-279: not enough arguments in call to leftT.countLeaves
🪛 golangci-lint (1.62.2)
cmd/migration-checker/main.go
18-18: could not import github.com/scroll-tech/go-ethereum/trie (-: # github.com/scroll-tech/go-ethereum/trie
trie/zk_trie.go:279:62: not enough arguments in call to leftT.countLeaves
have (*zktrie.Hash, func(key []byte, value []byte), int, bool)
want (*zktrie.Hash, func(key []byte, value []byte), int, bool, bool)
trie/zk_trie.go:282:63: not enough arguments in call to rightT.countLeaves
have (*zktrie.Hash, func(key []byte, value []byte), int, bool)
want (*zktrie.Hash, func(key []byte, value []byte), int, bool, bool)
trie/zk_trie.go:286:55: not enough arguments in call to t.countLeaves
have (*zktrie.Hash, func(key []byte, value []byte), int, bool)
want (*zktrie.Hash, func(key []byte, value []byte), int, bool, bool))
(typecheck)
81ab44f
to
12ec3a6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (4)
cmd/migration-checker/main.go (2)
61-65
: 🛠️ Refactor suggestionConsider returning errors instead of panicking.
Using panic for error handling is not a best practice. Consider returning errors and letting the caller decide how to handle them.
70-113
: 🛠️ Refactor suggestionRefactor for better modularity and readability.
The function is doing too many things: creating tries, loading data, comparing data, and handling preimage keys. Consider splitting it into smaller, more focused functions.
🧰 Tools
🪛 GitHub Check: check
[failure] 108-108:
unnecessary conversion (unconvert)trie/zk_trie.go (2)
242-248
: 🛠️ Refactor suggestionImprove error handling and add documentation.
The method needs the following improvements:
- Return an error instead of panicking
- Add nil check for the callback function
- Add documentation explaining the purpose and parameters
250-289
:⚠️ Potential issueFix missing arguments and improve error handling.
The method has several issues:
- Missing
verifyNodeHashes
argument in recursive calls- Using panic for error handling
- No error handling in goroutines
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
cmd/migration-checker/main.go
(1 hunks)trie/secure_trie.go
(1 hunks)trie/tracer.go
(4 hunks)trie/zk_trie.go
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- trie/tracer.go
- trie/secure_trie.go
🧰 Additional context used
🪛 GitHub Check: check
cmd/migration-checker/main.go
[failure] 108-108:
unnecessary conversion (unconvert)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: test
🔇 Additional comments (3)
cmd/migration-checker/main.go (3)
67-69
: LGTM!The function correctly creates a copy of the input byte slice.
158-166
: LGTM!The function correctly compares storage values between ZK and MPT tries.
218-246
: LGTM!The function correctly loads ZK trie data with proper synchronization for parallel processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
cmd/migration-checker/main.go (2)
21-22
: Consider encapsulating global variables within a struct.Global variables make testing difficult and can introduce subtle bugs. Consider encapsulating these within a struct:
-var accountsDone atomic.Uint64 -var trieCheckers = make(chan struct{}, runtime.GOMAXPROCS(0)*4) +type migrationChecker struct { + accountsDone atomic.Uint64 + trieCheckers chan struct{} +} + +func newMigrationChecker() *migrationChecker { + maxWorkers := runtime.GOMAXPROCS(0) * 4 + if maxWorkers <= 0 { + maxWorkers = 1 + } + return &migrationChecker{ + trieCheckers: make(chan struct{}, maxWorkers), + } +}
70-113
: Optimize memory usage in trie comparison.The function loads all leaves into memory before comparison. For large tries, this could cause out-of-memory issues.
Consider implementing a streaming comparison that processes leaves in batches:
func checkTrieEquality(...) { + const batchSize = 1000 // ... existing setup code ... - mptLeafMap := <-mptLeafCh - zkLeafMap := <-zkLeafCh + for { + mptBatch := make(map[string][]byte, batchSize) + zkBatch := make(map[string][]byte, batchSize) + // ... load and compare batches ... + if done { + break + } + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
cmd/migration-checker/main.go
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: test
🔇 Additional comments (1)
cmd/migration-checker/main.go (1)
61-65
: Replace panic with error returns for better error handling.Using panic for error handling is not a best practice.
See previous suggestion about returning errors instead of panicking.
1. Purpose or design rationale of this PR
...
2. PR title
Your PR title must follow conventional commits (as we are doing squash merge for each PR), so it must start with one of the following types:
3. Deployment tag versioning
Has the version in
params/version.go
been updated?4. Breaking change label
Does this PR have the
breaking-change
label?Summary by CodeRabbit
New Features
Bug Fixes