Skip to content

[ETCM-316] Fast-sync branch resolver #887

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 39 commits into from
Mar 4, 2021

Conversation

biandratti
Copy link
Contributor

@biandratti biandratti commented Jan 11, 2021

Description

This task has as dependence the task [ETCM-313]. When a invalid block is detected from the master peer. We need to search for a new peer potencial and validate the our chain saved with this new peer.
In this way, with this new peer we need to follow the next points.

  1. First ask this peer for the last several blocks, starting from current mantis best + 1, in reverse. i.e let's say our current best block is 100, then we ask potential peer for blocks with numbers from 101 to 101 - X, where x is some hyperparameter. Then we check the block one by one if they make a chain connection with our local chain. i.e first check if remote 101 is child of local 100, if not then check if remote 100 is child of local 99. If we find a match, rewind to a matched block and that starts syncing from there, and if not go to point 2.
  2. If we do find not matched blocks in most fresh local blocks, it means that chains diverged earlier. To find the first common block between local and remote blockchain we can use binary search as both blockchain are sorted lists. First we ask remote peer for a block in the middle of our local blockchain, if block from response is the same as local, then it means chains diverged later otherwise they diverged earlier.

Proposed Solution

We have an actor called FastSyncBranchResolver responsible for this behavior. When FastSync actor need to resolve it branch; FastSyncBranchResolver is called with the message “StartBranchResolver”. In this way, we have the following strategy:

  1. Get a new master peer.
  2. If we don't detect a master peer, we wait a second and retry again.
  3. Get a set of the last BlockHeaders from this master peer.
  4. Validate the set of blocks received from master peer andwe validate it with our chain. If deptect the last valid block, we discard the last invalid blocks in our chain and send a message “BranchResolvedSuccessful” to the fast sync to continue with the syncing. In case if we cant detect the last valid block it diverged earlier (5).
  5. We continue with a binary search validation, block per block with the peer master until we detect the last valid block in our chain. When detecting this block, we discard the last invalid blocks in our chain and send a message “BranchResolvedSuccessful to the fast sync to continue with the syncing.

Pending work

  • We need to remove binary search logic from the actor, and put it into another normal class. Making it possible to test it completely in all the cases. ✔️
  • [task: ETCM-531] We need to change the behavior of traits BlacklistSupport and PeerListSupport. Both should be actors. Today when we switched between FastSync and FastSyncBranchResolver, we lost the list of peers in memory from blackList and handshake. So we need to uncouple this behavior and don't lose the changes. This new behavior should also work in the future to regular sync. ✔️

Testing

It needs a battery of integration test to cover these case.

Copy link
Contributor

@KonradStaniec KonradStaniec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general idea looks good.

One thing i would change is to remove as much logic from actor, and move it to normal class to make it possible to test whole binary search logic without akka (and timeouts). So actor would be only to communicate with other actors

@robinraju robinraju force-pushed the ETCM-311-fast_sync_improvements branch 2 times, most recently from e2618f6 to 427934a Compare January 25, 2021 08:23
@robinraju robinraju self-requested a review January 25, 2021 11:32
@robinraju robinraju force-pushed the ETCM-311-fast_sync_improvements branch 5 times, most recently from d293268 to 4c7150e Compare February 4, 2021 07:38
@1015bit 1015bit changed the title WIP [ETCM-316] branch resolving [ETCM-316] branch resolving Feb 12, 2021
@1015bit 1015bit changed the title [ETCM-316] branch resolving [ETCM-316] branch resolving [!pr] Feb 16, 2021
Copy link

@pullrequest pullrequest bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look good. The tests appear to cover all of the new functionality. I don't find anything blocking, and no additional comments from what other reviewers have already covered. There are a couple of instances of the val blocksSentFromPee which I suspect should be named blocksSentFromPeer. Good work!

Image of Steven Steven


Reviewed with ❤️ by PullRequest

@robinraju robinraju force-pushed the ETCM-311-fast_sync_improvements branch from e2ae8bf to e59e0a0 Compare March 1, 2021 06:24
@1015bit 1015bit marked this pull request as ready for review March 1, 2021 07:12
@1015bit 1015bit changed the title [ETCM-316] branch resolving [!pr] [ETCM-316] Fast-sync branch resolver Mar 2, 2021
Copy link

@pullrequest pullrequest bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 Message
Due to its size, this pull request will likely have a little longer turnaround time and will probably require multiple passes from our reviewers.

@1015bit 1015bit changed the base branch from ETCM-311-fast_sync_improvements to ETCM-313-skeleton March 2, 2021 12:55
@1015bit 1015bit merged commit 141ec18 into ETCM-313-skeleton Mar 4, 2021
1015bit added a commit that referenced this pull request Mar 16, 2021
…892)

* create actor FastSyncBranchResolver

* ETCM-313 Download skeleton and then batch headers in parallel

* Implement HeaderSkeleton class

* Validate if skeleton header matches downloaded batch

* Improve validations

* Handle wrong skeleton from master peer

* Fix incorrect example

* Validate PoW of skeleton headers

* Fix bugs found with tests

* Add call to the branch resolver

* Add missing config entries

* Fix unit tests

* Apply scalafmt

* Cleanup tests

* Fastsync: stick with the same master peer while requesting skeleton headers.

* [ETCM-313] Integrate branch resolver actor with fast sync

* [ETCM-313] Fix integration tests, format error messages

* create actor FastSyncBranchResolver

* create actor FastSyncBranchResolver

* [ETCM-316] Fast-sync branch resolver (#887)

* init new actor, FastSyncBranchResolver

* added searching mode in fast sync branch resolver

* fix style

* add schedule when don't get peers

* Added ut in class FastSyncBranchResolver

* change messages

* fix case object...

* add new unit test

* add new unit test

* change name getFirstCommonBlock

* change name to batch

* init new actor, FastSyncBranchResolver

* added searching mode in fast sync branch resolver

* fix style

* add schedule when don't get peers

* Added ut in class FastSyncBranchResolver

* change messages

* fix case object...

* add new unit test

* add new unit test

* create actor FastSyncBranchResolver

* change name getFirstCommonBlock

* change name to batch

* Reformat triggered by sbt pp

* Cleanup and simplify

* Handle error cases

* Fix tests

* [ETCM-316] Add more tests and fix binary search logic

* [ETCM-316] Finish tests for branch resolving

* [ETCM-316] Cleanup

* [ETCM-316] Small test improvements

* [ETCM-316] Log binary search state

* [ETCM-316] Move some logging to improve readability

* [ETCM-316] Remove unneeded errors and reformat

* [ETCM-316] Handle branch resolution failure

* [ETCM-316] Address PR comments

* [ETCM-316] Remove unnecessary string interpolation

Co-authored-by: Petra Bierleutgeb <petra.bierleutgeb@iohk.io>

* [ETCM-313] Reworked header skeleton (still needs refactoring)

* [ETCM-313] Remove empty method

* Fix SyncController tests, add more logging

* Remove logging that broke integration tests (timeout)

* [ETCM-313] More refactorings

* [ETCM-313] Fix integration tests

* [ETCM-313] Re-request header skeleton in case of errors

* [ETCM-313] Remove skeleton handler name

* [ETCM-313] Small fixes and better logs

* [ETCM-313] Update the default number of requested block headers to not be higher than the default max number of headers returned

* [ETCM-313] Adapt branchresolver recent blocks request.

Co-authored-by: Maximiliano Biandratti <maximiliano.biandratti@iohk.io>
Co-authored-by: Robin Raju <robinraju@users.noreply.github.com>
Co-authored-by: Petra Bierleutgeb <petra.bierleutgeb@iohk.io>
Co-authored-by: biandratti <72261652+biandratti@users.noreply.github.com>
Co-authored-by: Petra Bierleutgeb <328036+pbvie@users.noreply.github.com>
@dzajkowski dzajkowski deleted the ETCM-316-branch-resolving branch April 9, 2021 12:01
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants