You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
P1sar opened this issue
Jan 22, 2024
· 2 comments
· Fixed by #3756
Assignees
Labels
A-debugissue requires detective debug work to figure out what's going wrong.S-networkissues related to the dot/network package.S-testsissue related to adding new tests.T-bugthis issue covers unexpected and/or wrong behaviour.
While fixing bug Flaky RPC test "TestStableNetworkRPC" #3616 another unexpected non deterministic behaviour was found. This happens much rarely than one mentioned before but never the less this might be a symptom of something wrong.
Apparently related to lilbp2p DHT
Expected Behavior
Test TestStableNetworkRPC should deterministically work
Current Behavior
Test TestStableNetworkRP un deterministically fail with next error;
Possible Solution
It might be related to libp2p Kademilia
To Reproduce
Steps to reproduce the behaviour:
run MODE=rpc go test ./tests/rpc/system_integration_test.go -v
it might be necessary to run this several time since it is un deterministic problem
Log output
Log Output
=== RUN TestStableNetworkRPC
system_integration_test.go:53: Node alice-0 is ready
system_integration_test.go:53: Node bob-1 is ready
node.go:233: node alice-0 crashed: alice-0 encountered a runtime error: exit status 2
command: /go/src/github.com/ChainSafe/gossamer/bin/gossamer --base-path /tmp/TestStableNetworkRPC3368486598/001 --chain /go/src/github.com/ChainSafe/gossamer/chain/westend-dev/westend-dev-spec-raw.json --role full --no-telemetry
/tmp/TestStableNetworkRPC3368486598/001
Logs:
2024-01-22T12:23:07Z INFO 🕸️ initialising node with name Gossamer, id gssmr, base path /tmp/TestStableNetworkRPC3368486598/001 and chain-spec /tmp/TestStableNetworkRPC3368486598/001/chain-spec-raw.json... pkg=dot
2024-01-22T12:23:08Z INFO block state hash genesis hash: 0x276bfa91f70859348285599321ea96afd3ae681f0be47d36196bac8075ea32e8 pkg=state
2024-01-22T12:23:08Z INFO node initialised with name Gossamer, id gssmr, base path /tmp/TestStableNetworkRPC3368486598/001, chain-spec /tmp/TestStableNetworkRPC3368486598/001/chain-spec-raw.json, block 0 and genesis hash 0x276bfa91f70859348285599321ea96afd3ae681f0be47d36196bac8075ea32e8 pkg=dot
2024-01-22T12:23:08Z INFO 🕸️ initialising node services with global configuration name Gossamer, id gssmr and base path /tmp/TestStableNetworkRPC3368486598/001... pkg=dot
2024-01-22T12:23:08Z INFO created state service with head 0x276bfa91f70859348285599321ea96afd3ae681f0be47d36196bac8075ea32e8, highest number 0 and genesis hash 0x276bfa91f70859348285599321ea96afd3ae681f0be47d36196bac8075ea32e8 pkg=state
2024-01-22T12:23:08Z INFO Generating p2p identity with seed 0 and key file /tmp/TestStableNetworkRPC3368486598/001/node.key pkg=network
2024-01-22T12:23:08Z WARN Bootstrap is enabled but no bootstrap nodes are defined pkg=network
2024-01-22T12:23:11Z INFO creating runtime with interpreter wazero... pkg=dot
2024-01-22T12:23:11Z INFO registered notifications sub-protocol /276bfa91f70859348285599321ea96afd3ae681f0be47d36196bac8075ea32e8/grandpa/1 pkg=network
2024-01-22T12:23:11Z INFO creating BABE service as authority... pkg=dot
2024-01-22T12:23:11Z INFO keystore with keys [0xc000443a50] pkg=dot
2024-01-22T12:23:11Z INFO creating rpc service with host localhost, external=false, port 8540, modules system,author,chain,state,rpc,grandpa,offchain,childstate,syncstate,payment, ws port 8546 and ws external=true pkg=dot
2024-01-22T12:23:11Z INFO starting node Gossamer... pkg=cmd
2024-01-22T12:23:11Z INFO 🕸️ starting node services... pkg=dot
2024-01-22T12:23:11Z INFO Starting services: [*system.Service *network.Service *digest.Handler *core.Service *grandpa.Service *sync.Service *babe.Service *rpc.HTTPServer *state.Service] pkg=dot,services
2024-01-22T12:23:11Z INFO registered notifications sub-protocol /dot/block-announces/1 pkg=network
2024-01-22T12:23:11Z INFO registered notifications sub-protocol /dot/transactions/1 pkg=network
2024-01-22T12:23:11Z INFO Started listening on /ip4/127.0.0.1/tcp/7004/p2p/12D3KooWRkzukwr3U9e4xcfTpMUfuejYPvXCESr8cf1tZ5xWyrTv pkg=network
2024-01-22T12:23:11Z INFO Started listening on /ip4/65.109.106.232/tcp/7004/p2p/12D3KooWRkzukwr3U9e4xcfTpMUfuejYPvXCESr8cf1tZ5xWyrTv pkg=network
2024-01-22T12:23:12Z INFO started network service with supported protocols /ipfs/ping/1.0.0, /ipfs/id/1.0.0, /ipfs/id/push/1.0.0, /276bfa91f70859348285599321ea96afd3ae681f0be47d36196bac8075ea32e8/grandpa/1, /dot/sync/2, /dot/light/2, /dot/block-announces/1, /dot/transactions/1 pkg=network
2024-01-22T12:23:12Z INFO Starting HTTP Server on host localhost and port 8540... pkg=rpc
2024-01-22T12:23:12Z INFO Starting WebSocket Server on host localhost and port 8546... pkg=rpc
2024-01-22T12:23:12Z INFO initiating epoch 0 with start slot 284321032 pkg=babe
2024-01-22T12:23:12Z WARN target=runtime::inclusion-inherent message=ParentBlockRandomness did not provide entropy ext_logging_log_version_1 pkg=runtime module=wazero
2024-01-22T12:23:12Z WARN target=runtime::inclusion-inherent message=ParentBlockRandomness did not provide entropy ext_logging_log_version_1 pkg=runtime module=wazero
2024-01-22T12:23:12Z WARN target=runtime::inclusion-inherent message=ParentBlockRandomness did not provide entropy ext_logging_log_version_1 pkg=runtime module=wazero
2024-01-22T12:23:16Z INFO built block 1 with hash 0x6a00c9595bd6df4f42278a5596b385aa41bbcff070815d09e5200fbadf176062, state root 0x012fe83ae98751a3522bfbfe3a[271](https://github.com/ChainSafe/gossamer/actions/runs/7610987482/job/20725498644?pr=3718#step:4:272)d15206511bb12e4ce2389fd21ba763c2713, epoch 0 and slot [284](https://github.com/ChainSafe/gossamer/actions/runs/7610987482/job/20725498644?pr=3718#step:4:285)321032 pkg=babe
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0xf98afc]
goroutine 1043 [running]:
github.com/libp2p/go-libp2p-kad-dht/dual.(*DHT).FindPeer.func3()
github.com/libp2p/go-libp2p-kad-dht@v0.25.2/dual/dual.go:271 +0x7c
created by github.com/libp2p/go-libp2p-kad-dht/dual.(*DHT).FindPeer in goroutine 451
github.com/libp2p/go-libp2p-kad-dht@v0.25.2/dual/dual.go:269 +0x38b
system_integration_test.go:53: failed to start node charlie-2: failed waiting: node did not start: failed after 5 tries during 5s (context canceled)
system_integration_test.go:53: Node bob-1 terminated
panic: test timed out after 10m0s
paste log output...
Specification
go version:
gossamer version:
gossamer commit tag:
gossamer commit hash:
operating system:
additional links:
The text was updated successfully, but these errors were encountered:
P1sar
added
T-bug
this issue covers unexpected and/or wrong behaviour.
S-network
issues related to the dot/network package.
A-debug
issue requires detective debug work to figure out what's going wrong.
S-tests
issue related to adding new tests.
labels
Jan 22, 2024
Based on my investigation this is what happening:
When Network services starts it is running peer discovery asynchronously here
There is not communication channel with this goroutine and pretty often during the test this peer discovery is not discovring any peers, so NetworkService.host.discovery.dht is nill which leads to nill pointer panic here. Apparently this happens when other peers know about the peer and send it messages.
Generally speaking I think that there is no obvious solutions for this problem. Some are:
making sure that NetworkService.host.discovery.dht is always set to empty DHT this will prevent panic and reject message processing, although it is hard to tell the real consequences of not processing messages.
Problem is somewhere deeper in DHT peer search and we should maybe solve this as root case.
Maybe problem happening (make sense) only locally, one possible way to overcome it is to set list of bootnodes, since they all are starting locally we can hardcode list of bootnodes
A-debugissue requires detective debug work to figure out what's going wrong.S-networkissues related to the dot/network package.S-testsissue related to adding new tests.T-bugthis issue covers unexpected and/or wrong behaviour.
Describe the bug
Apparently related to lilbp2p DHT
Expected Behavior
Current Behavior
Possible Solution
To Reproduce
Steps to reproduce the behaviour:
MODE=rpc go test ./tests/rpc/system_integration_test.go -v
Log output
Log Output
Specification
The text was updated successfully, but these errors were encountered: