Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Chain Download Halt #6884

Closed
siladu opened this issue Apr 5, 2024 · 1 comment · Fixed by #7162
Closed

Chain Download Halt #6884

siladu opened this issue Apr 5, 2024 · 1 comment · Fixed by #7162
Assignees
Labels
bug Something isn't working P2 High (ex: Degrading performance issues, unexpected behavior of core features (DevP2P, syncing, etc))

Comments

@siladu
Copy link
Contributor

siladu commented Apr 5, 2024

Seen on 24.3.0 and 24.4.0-RC2

On fresh SNAP/CHECKPOINT sync on a variety of networks, block import sometimes halts and requires a restart, despite having a full set of peers and world state downloading finishing.

Example:

{"@timestamp":"2024-04-04T05:48:35,877","level":"INFO","thread":"EthScheduler-Services-37 (importBlock)","class":"ImportBlocksStep","message":"Block import progress: 4030400 of 5625239 (71%)","throwable":""}
{"@timestamp":"2024-04-04T05:48:39,175","level":"INFO","thread":"EthScheduler-Services-26 (batchPersistTrieNodeData)","class":"SnapSyncMetricsManager","message":"Healed 221585 world state trie nodes, Peer count: 6","throwable":""}
{"@timestamp":"2024-04-04T05:49:42,821","level":"INFO","thread":"EthScheduler-Services-36 (requestCompleteTask)","class":"SnapWorldDownloadState","message":"Pausing world state download while waiting for sync to complete","throwable":""}

// and then nothing until restart
@siladu siladu changed the title Block Import Halt Chain Download Halt Apr 5, 2024
@siladu
Copy link
Contributor Author

siladu commented Apr 8, 2024

Tested this logging on a Sepolia node and it hung at 99% (note in this case, it looks like worldstate didn't finish)...

It gets stuck in a loop where it repeatedly prints this log for the same peers every few seconds.

...
// last world state log
2024-04-06 02:52:10.313+00:00 | EthScheduler-Services-3 (batchPersistAccountData) | INFO  | SnapSyncMetricsManager | Worldstate download progress: 49.64%, Peer count: 7
2024-04-06 02:52:41.596+00:00 | EthScheduler-Services-43 (importBlock) | INFO  | ImportBlocksStep | Block import progress: 3779220 of 5638048 (67%)
...
2024-04-06 07:18:36.991+00:00 | EthScheduler-Services-43 (importBlock) | INFO  | ImportBlocksStep | Block import progress: 5633620 of 5638048 (99%)
2024-04-06 07:18:38.691+00:00 | EthScheduler-Services-38 (fetchCheckpoints) | DEBUG | FastSyncDownloadPipelineFactory | Stopping chain download as lastRoundHeader=5638329 is not less than pivotBlockHeader=5638329 for peer PeerId: 0x8b61dc2d06c3f96f... PeerReputation score: 150, timeouts: {}, useless: 1, validated? true, disconnected? false, client: Geth/v1.13.14-stable-2bd6bd01/linux-amd64/go1.21.7, [Connection with hashCode 81600091 inboundInitiated? false initAt 1712368196470], enode://8b61dc2d06c3f96fddcbebb0efb29d60d3598650275dc469c22229d3e5620369b0d3dedafd929835fe7f489618f19f456fe7c0df572bf2d914a9f4e006f783a9@170.64.250.88:30303
2024-04-06 07:19:10.446+00:00 | EthScheduler-Services-42 (fetchCheckpoints) | DEBUG | FastSyncDownloadPipelineFactory | Stopping chain download as lastRoundHeader=5638329 is not less than pivotBlockHeader=5638329 for peer PeerId: 0x525ca6453105fc8b... PeerReputation score: 150, timeouts: {}, useless: 0, validated? true, disconnected? false, client: Geth/v1.13.14-stable-2bd6bd01/linux-amd64/go1.21.6, [Connection with hashCode -1783042358 inboundInitiated? false initAt 1712385475223], enode://525ca6453105fc8b7364de85e6494a5db288fade51aac5620fb94ec17c9e771f4daef3b1aadad8a954b620484d25d2b6e37fde2787a84034f361d69b491fe10f@61.173.39.51:17004
2024-04-06 07:19:15.623+00:00 | EthScheduler-Services-38 (fetchCheckpoints) | DEBUG | FastSyncDownloadPipelineFactory | Stopping chain download as lastRoundHeader=5638329 is not less than pivotBlockHeader=5638329 for peer PeerId: 0x525ca6453105fc8b... PeerReputation score: 150, timeouts: {}, useless: 0, validated? true, disconnected? false, client: Geth/v1.13.14-stable-2bd6bd01/linux-amd64/go1.21.6, [Connection with hashCode -1783042358 inboundInitiated? false initAt 1712385475223], enode://525ca6453105fc8b7364de85e6494a5db288fade51aac5620fb94ec17c9e771f4daef3b1aadad8a954b620484d25d2b6e37fde2787a84034f361d69b491fe10f@61.173.39.51:17004
2024-04-06 07:19:21.356+00:00 | EthScheduler-Services-41 (fetchCheckpoints) | DEBUG | FastSyncDownloadPipelineFactory | Stopping chain download as lastRoundHeader=5638329 is not less than pivotBlockHeader=5638329 for peer PeerId: 0x525ca6453105fc8b... PeerReputation score: 150, timeouts: {}, useless: 0, validated? true, disconnected? false, client: Geth/v1.13.14-stable-2bd6bd01/linux-amd64/go1.21.6, [Connection with hashCode -1783042358 inboundInitiated? false initAt 1712385475223], enode://525ca6453105fc8b7364de85e6494a5db288fade51aac5620fb94ec17c9e771f4daef3b1aadad8a954b620484d25d2b6e37fde2787a84034f361d69b491fe10f@61.173.39.51:17004
simon.dudley@dev-elc-besu-teku-sepolia-simon-6885-283fe42:~$ cat besu-99-halt.log  | grep "Stopping chain download" | sed 's?.*\(enode://.*@.*:\).*?\1?' | sort | uniq -c | sort
      1 enode://8b61dc2d06c3f96fddcbebb0efb29d60d3598650275dc469c22229d3e5620369b0d3dedafd929835fe7f489618f19f456fe7c0df572bf2d914a9f4e006f783a9@170.64.250.88:
      2 enode://b3dc4688b686d5cef81f1b3b16b098ffbd9ace6a5d5bd5f1fbd12b2635cae637ca220a2f18f00f156982b038a5324e45d3b627df7e6416f30e5f9689ebdc47e9@79.100.217.83:
     77 enode://525ca6453105fc8b7364de85e6494a5db288fade51aac5620fb94ec17c9e771f4daef3b1aadad8a954b620484d25d2b6e37fde2787a84034f361d69b491fe10f@61.173.39.51:
     88 enode://dd44d7ba187ac7f6fc322bc3c31ff0b071b878100f7c655cc34bb238a3514c347baed37df75f9603805613e87b9b8fd7dfc078dc0185666f8cde466e1f7675a9@125.118.60.99:
    227 enode://eed8e8a489504305d46fa6fa4614dc81447220f9f9e0b014f66693fff1af692b0f995b011fddfb97c3103965b57e69bb9fc0f90ddcb68ae2806f2b15005cc5a5@45.250.253.66:
    404 enode://cd4ba553c37c7437a0a10875a1ee8bc59ae09f087551a7bffa26e5478872dc59ae1553fe315d7aaa688ae63e9079052c9a779ff1de15866b5ee51c2f721308cb@147.28.163.15:
    433 enode://a5fcbb3b98c5f0a6b03774bcae9c935807fa010858f93049106e9fe4c54e41a04cdb7cadb643e18b90131bd381b996e9c70005f25ca03bbb7f440fed59b02457@87.246.108.93:
    635 enode://f4abc41b943949465b1dac6f58496a6b5b891b5232b0c8e88e761214fe3e14751b7fb6937d53c49aad59e4409e5c3276bd8927469a539184431c6c3d92b570a1@65.109.68.53:
   1403 enode://3eab5b562114111bfc9bdc3e58b057309dc8730e47eefa318be98d82b7080edc789ef21e83ad9d58aea250e915d8905cc2634d9d2b96cf4f08fc558bcbc0f3a9@63.251.87.45:
   1407 enode://4276635a44d9cf13b54ad85b83435941493629ce7c090ba775a16bfa6d4dbf699337c27fac10f3554c0c026bb88c511b45045a6ab4f10654abc8029c81184512@172.233.225.4:
   1581 enode://06c4ac72f097b91c8f3fcb038825cb6cbe1fc7c7dfe9584bf35e44521352c628a8cf0d702ee00acb1f599361fe0c0758aa253dfc5b449ff61864e145ecb4e8ab@34.82.82.140:
   4889 enode://39d1ec7b6aa5e3647a222b0cadb8857254d6f40d731aeb696a757f614e6a94ed7125ba6cdc40f2d2ea561d5b2aec9125457206286fe9d51424c714d1d6faeb6f@54.80.216.115:
   9215 enode://a302b62e0cb25d51292cd61ed7d8496435b9bc139a7223939096912564e7803adb2e778445b10a7f15f7cae57b619c3c10ca1306c342c710a75930372482ba39@5.59.232.59:

@siladu siladu added bug Something isn't working P2 High (ex: Degrading performance issues, unexpected behavior of core features (DevP2P, syncing, etc)) labels Apr 8, 2024
@macfarla macfarla added P1 Very High (ex: Security issues, unplanned backward incompatibility preventing network participation) and removed P2 High (ex: Degrading performance issues, unexpected behavior of core features (DevP2P, syncing, etc)) labels Apr 8, 2024
@macfarla macfarla assigned jframe and unassigned jframe Apr 8, 2024
@macfarla macfarla added P2 High (ex: Degrading performance issues, unexpected behavior of core features (DevP2P, syncing, etc)) and removed P1 Very High (ex: Security issues, unplanned backward incompatibility preventing network participation) labels Apr 8, 2024
@macfarla macfarla assigned jframe and pinges and unassigned jframe Apr 10, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working P2 High (ex: Degrading performance issues, unexpected behavior of core features (DevP2P, syncing, etc))
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants