Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[CRDT] Reconnecting a peer causes issues receiving deltas and sending out pins #798

Closed
lanzafame opened this issue May 28, 2019 · 6 comments
Assignees
Labels
kind/bug A bug in existing code (including security flaws) status/in-progress In progress

Comments

@lanzafame
Copy link
Contributor

Additional information:

  • OS: Linux
  • IPFS Cluster version: master
  • Installation method: built from source

Describe the bug:
Three peer cluster, two peers bootstrapped to the first peer. Shutdown 2nd peer and then start again without --bootstrap as the trusted peers PeerInfo should now be in the peerstore and it will connect to those peers when it starts back up. This worked fine, confirmed with a peers ls against the restarted peer. The issue occurred when pin operations against the first peer started.

The following is an excerpt of the logs from the restarted peer:

15:57:00.335 ERROR       crdt: error getting root delta priority: %s failed to get block for QmVGD6aZQsbjpLpNTWEnK57kYLLtBaFSn4k39hGMw1nzmq: context deadline exceeded crdt.go:413
15:57:00.336 ERROR       crdt: error getting delta: context deadline exceeded crdt.go:420
15:57:01.456 ERROR       crdt: error getting root delta priority: %s failed to get block for Qmaz5xpm5wgMg4ULiE9WtpQz6xdURK2Wuafyi88F1dLAVS: context deadline exceeded crdt.go:413
15:57:01.456 ERROR       crdt: error getting delta: context deadline exceeded crdt.go:420
15:57:02.559 ERROR       crdt: error getting root delta priority: %s failed to get block for QmdfJ7ouWeCa88BYRgrXkGk6kjtkCyC1PHRyeHdYiGitqY: context deadline exceeded crdt.go:413
15:57:02.559 ERROR       crdt: error getting delta: context deadline exceeded crdt.go:420

If you ipfs-cluster-ctl add <file> from the out-of-sync peer, it will attempt to pin on all the peers but at the height of the out-of-sync peer, though it doesn't succed:

### out-of-sync peer
16:10:27.023  INFO    cluster: pinning QmZkDEwrfwHKfoqWg3jnqSVwEWiFCU9TRnnRM2Xe1xYNqh everywhere: cluster.go:1227
16:10:27.026  INFO       crdt: new pin added: QmZkDEwrfwHKfoqWg3jnqSVwEWiFCU9TRnnRM2Xe1xYNqh consensus.go:199
16:10:27.027  INFO       crdt: adding new DAG head: QmUCBQPpMvTymSsuwJPU8KMXveUGHpweD4aamjc4iuj96U (height: 1) heads.go:114
16:10:27.028  INFO      adder: QmZkDEwrfwHKfoqWg3jnqSVwEWiFCU9TRnnRM2Xe1xYNqh successfully added to cluster adder.go:163
16:10:27.036  INFO   ipfshttp: IPFS Pin request succeeded:  QmZkDEwrfwHKfoqWg3jnqSVwEWiFCU9TRnnRM2Xe1xYNqh ipfshttp.go:306
### bootstrap peer
15:56:20.426  INFO    cluster: pinning QmQTzvHwJZ8N3G8tQdiZ9ykGD3Nnxb5cyUh9tJRSXLeAmT everywhere: cluster.go:1227
15:56:20.454  INFO       crdt: new pin added: QmQTzvHwJZ8N3G8tQdiZ9ykGD3Nnxb5cyUh9tJRSXLeAmT consensus.go:199
15:56:20.462  INFO       crdt: replacing DAG head: QmQHzBCnuvuSvyZhwpK16XoggavjvKZ4ebX2wXsZccHRLT -> QmPC53f4uj5vT2sBicyFt6NWPLxAWUA4CdSHcKQqvTaBcx (**new height: 19**) heads.go:82
16:10:27.036  INFO       crdt: new pin added: QmZkDEwrfwHKfoqWg3jnqSVwEWiFCU9TRnnRM2Xe1xYNqh consensus.go:199
16:10:27.038  INFO       crdt: adding new DAG head: QmUCBQPpMvTymSsuwJPU8KMXveUGHpweD4aamjc4iuj96U (**height: 1**) heads.go:114
16:11:03.795 WARNI    cluster: metric alert for ping: Peer: 12D3KooWBZjhNyT26AdHmvgvWeuUWBCs7ChBTuawhX8niPCpjtq2. cluster.go:321
16:11:03.799 WARNI    cluster: metric alert for freespace: Peer: 12D3KooWBZjhNyT26AdHmvgvWeuUWBCs7ChBTuawhX8niPCpjtq2. cluster.go:321

It appears that the out-of-sync peer never recovers...

@lanzafame lanzafame added kind/bug A bug in existing code (including security flaws) need/review Needs a review labels May 28, 2019
@lanzafame
Copy link
Contributor Author

After restarting the out-of-sync peer again but with the --bootstrap flag added back, it appears to function correctly but it does still log the error getting root delta priority error.

@lanzafame
Copy link
Contributor Author

so checking back after fixing my test setup and this one is still broken.

@lanzafame
Copy link
Contributor Author

The two in-sync peers don't even think that the out-sync-peer should be pinning any of the pins...

@lanzafame
Copy link
Contributor Author

#792 should have made that a peer in the trusted peerset should only have to bootstrap once when using CRDT consensus.

@hsanjuan
Copy link
Collaborator

hsanjuan commented Jun 7, 2019

The two in-sync peers don't even think that the out-sync-peer should be pinning any of the pins...

Not sure what that means. If status is returning unpinned is because the out-of-sync peer does not think it should be pinning anything.

Let's discuss about this during standup. I am seeing the "getting root delta" error by default and other peers cannot sync at all because they cannot even get the root so things don't work for me.

@hsanjuan
Copy link
Collaborator

hsanjuan commented Jun 7, 2019

ok I found an issue (the issue probably)

hsanjuan added a commit that referenced this issue Jun 7, 2019
Bitswap needs to exist before connections are opened!

Fixes #798
@hsanjuan hsanjuan self-assigned this Jun 10, 2019
@hsanjuan hsanjuan added status/in-progress In progress and removed need/review Needs a review labels Jun 10, 2019
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
kind/bug A bug in existing code (including security flaws) status/in-progress In progress
Projects
None yet
Development

No branches or pull requests

2 participants