Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Sync from newest to oldest #620

Open
jbearer opened this issue Jun 4, 2024 · 1 comment · May be fixed by #726
Open

Sync from newest to oldest #620

jbearer opened this issue Jun 4, 2024 · 1 comment · May be fixed by #726
Assignees

Comments

@jbearer
Copy link
Member

jbearer commented Jun 4, 2024

It would be better to run proactive scans, especially major scans, backwards, because the newest data is both the most likely to be missing (in the case where a node has just been offline for a short time, like an update) and the most likely to be queried.

Adding a reverse block stream might help simplify some of the transaction streaming stuff for the explorer as well.

@jbearer
Copy link
Member Author

jbearer commented Jun 5, 2024

I'm thinking of an idea that might be better. Instead of having major and minor scans that run repeatedly, we just have 2 scans total, that both run perpetually, and we dovetail them in chunks. I will call these scans "heads" and "tails".

The heads scan always follows the chain head. It's job is to fetch recent blocks that we may have missed on decide, such as if we briefly lost network connection and missed a decide. The tails scan starts at the tail of the chain and follows behind the heads scan. Its job is to fetch old data that we're missing, such as after a long restart.

Both of these scans follow the same exact pattern, except for where they start and what they do when the reach the current block height:

  • The tails scan starts at block 0 (or the pruned height)
  • The heads scan starts at the last saved block height
  • When a scan runs, it starts from where it previously left off, and fetches blocks until it reaches a chunk limit, or the tip of the chain. Then it sleeps and yields to the other scan.
  • When the heads scan hits the tip of the chain, it just yields. This way it will most of the time be following very close to the chain tip, but if the chain tip ever jumps forward a lot (such as after a network outage or restart) the heads scan will go over all the blocks that were missed
  • When the tails scan catches up to the heads scan (it will always catch up, because it can scan as fast as it is able, while the heads scan is bounded by the time it takes for new blocks to be decided) it starts over from 0 (or the pruned height)

The big advantage of this scheme is a bounded amount of work done each time a scan runs. We don't have occasional long pauses where a major scan runs, CPU usage spikes, and we stop fetching more recent missing blocks (because minor scans have to wait for the major scan to finish).

Another big advantage is on startup, we don't have to wait for a major scan to read from the database all the way from 0, before we start fetching more recent blocks which are more likely to be missing.

@jbearer jbearer self-assigned this Oct 31, 2024
jbearer added a commit that referenced this issue Oct 31, 2024
This changes leaf fetching to require already having the _next_
leaf. This tells us what the hash should be for the leaf being fetched.

This addresses two unrelated issues:
1. Leaf fetching is trusted: we were not previously verifying that
   the leaf returned by a peer is valid in any way. Now, we can verify
   the fetched leaf against the expected hash (and similarly for the
   fetched QC). We exploit the chaining property of HotShot leaves to
   avoid having to run any kind of consensus light client to verify
   fetched leaves.
2. Fetching leaves by hash instead of (or in addition to) by height
   allows us to implement more providers. For example, we can now
   implement a provider that pulls leaves from undecided consensus
   storage, by hash, which allows us to fetch a leaf from our own
   storage even if we missed the corresponding decide event.

Knock-on changes:
* Proactive scanning now runs backwards, since leaf fetching becomes
  kind of inherently backwards. This was a change we wanted anyways
  (closes #620)
* To facilitate that, we have added reverse streams for all the
  resource types, which may come in handy for the explorer
* Receiving a leaf (either by fetching or decide event) now triggers
  us to fetch the parent if it is missing. This is supplements the
  proactive fetcher and can often help us obtain missing data really
  quickly (e.g. if we just missed one decide)
* `NoStorage` is gone. The purpose of `NoStorage` was to stress test
  fetching, which it did in the beginning, but it has been ages since
  we found a real bug this way; we now have plenty of other adversarial
  fetching tests, and `NoStorage` is becoming more trouble than it's
  worth to maintain. In particular, effective fetching now depends on
  having somewhat reliable storage (a reasonable assumption!) because
  we assume if you fetch a leaf, you can look it up shortly thereafter
  and thus fetch its parent. Thus, the `NoStorage` tests were failing
  or very slow because of many failed requests, with this change.
@jbearer jbearer linked a pull request Oct 31, 2024 that will close this issue
jbearer added a commit that referenced this issue Oct 31, 2024
This changes leaf fetching to require already having the _next_
leaf. This tells us what the hash should be for the leaf being fetched.

This addresses two unrelated issues:
1. Leaf fetching is trusted: we were not previously verifying that
   the leaf returned by a peer is valid in any way. Now, we can verify
   the fetched leaf against the expected hash (and similarly for the
   fetched QC). We exploit the chaining property of HotShot leaves to
   avoid having to run any kind of consensus light client to verify
   fetched leaves.
2. Fetching leaves by hash instead of (or in addition to) by height
   allows us to implement more providers. For example, we can now
   implement a provider that pulls leaves from undecided consensus
   storage, by hash, which allows us to fetch a leaf from our own
   storage even if we missed the corresponding decide event.

Knock-on changes:
* Proactive scanning now runs backwards, since leaf fetching becomes
  kind of inherently backwards. This was a change we wanted anyways
  (closes #620)
* To facilitate that, we have added reverse streams for all the
  resource types, which may come in handy for the explorer
* Receiving a leaf (either by fetching or decide event) now triggers
  us to fetch the parent if it is missing. This is supplements the
  proactive fetcher and can often help us obtain missing data really
  quickly (e.g. if we just missed one decide)
* `NoStorage` is gone. The purpose of `NoStorage` was to stress test
  fetching, which it did in the beginning, but it has been ages since
  we found a real bug this way; we now have plenty of other adversarial
  fetching tests, and `NoStorage` is becoming more trouble than it's
  worth to maintain. In particular, effective fetching now depends on
  having somewhat reliable storage (a reasonable assumption!) because
  we assume if you fetch a leaf, you can look it up shortly thereafter
  and thus fetch its parent. Thus, the `NoStorage` tests were failing
  or very slow because of many failed requests, with this change.
jbearer added a commit that referenced this issue Nov 12, 2024
This changes leaf fetching to require already having the _next_
leaf. This tells us what the hash should be for the leaf being fetched.

This addresses two unrelated issues:
1. Leaf fetching is trusted: we were not previously verifying that
   the leaf returned by a peer is valid in any way. Now, we can verify
   the fetched leaf against the expected hash (and similarly for the
   fetched QC). We exploit the chaining property of HotShot leaves to
   avoid having to run any kind of consensus light client to verify
   fetched leaves.
2. Fetching leaves by hash instead of (or in addition to) by height
   allows us to implement more providers. For example, we can now
   implement a provider that pulls leaves from undecided consensus
   storage, by hash, which allows us to fetch a leaf from our own
   storage even if we missed the corresponding decide event.

Knock-on changes:
* Proactive scanning now runs backwards, since leaf fetching becomes
  kind of inherently backwards. This was a change we wanted anyways
  (closes #620)
* To facilitate that, we have added reverse streams for all the
  resource types, which may come in handy for the explorer
* Receiving a leaf (either by fetching or decide event) now triggers
  us to fetch the parent if it is missing. This is supplements the
  proactive fetcher and can often help us obtain missing data really
  quickly (e.g. if we just missed one decide)
* `NoStorage` is gone. The purpose of `NoStorage` was to stress test
  fetching, which it did in the beginning, but it has been ages since
  we found a real bug this way; we now have plenty of other adversarial
  fetching tests, and `NoStorage` is becoming more trouble than it's
  worth to maintain. In particular, effective fetching now depends on
  having somewhat reliable storage (a reasonable assumption!) because
  we assume if you fetch a leaf, you can look it up shortly thereafter
  and thus fetch its parent. Thus, the `NoStorage` tests were failing
  or very slow because of many failed requests, with this change.
jbearer added a commit that referenced this issue Nov 12, 2024
This changes leaf fetching to require already having the _next_
leaf. This tells us what the hash should be for the leaf being fetched.

This addresses two unrelated issues:
1. Leaf fetching is trusted: we were not previously verifying that
   the leaf returned by a peer is valid in any way. Now, we can verify
   the fetched leaf against the expected hash (and similarly for the
   fetched QC). We exploit the chaining property of HotShot leaves to
   avoid having to run any kind of consensus light client to verify
   fetched leaves.
2. Fetching leaves by hash instead of (or in addition to) by height
   allows us to implement more providers. For example, we can now
   implement a provider that pulls leaves from undecided consensus
   storage, by hash, which allows us to fetch a leaf from our own
   storage even if we missed the corresponding decide event.

Knock-on changes:
* Proactive scanning now runs backwards, since leaf fetching becomes
  kind of inherently backwards. This was a change we wanted anyways
  (closes #620)
* To facilitate that, we have added reverse streams for all the
  resource types, which may come in handy for the explorer
* Receiving a leaf (either by fetching or decide event) now triggers
  us to fetch the parent if it is missing. This is supplements the
  proactive fetcher and can often help us obtain missing data really
  quickly (e.g. if we just missed one decide)
* `NoStorage` is gone. The purpose of `NoStorage` was to stress test
  fetching, which it did in the beginning, but it has been ages since
  we found a real bug this way; we now have plenty of other adversarial
  fetching tests, and `NoStorage` is becoming more trouble than it's
  worth to maintain. In particular, effective fetching now depends on
  having somewhat reliable storage (a reasonable assumption!) because
  we assume if you fetch a leaf, you can look it up shortly thereafter
  and thus fetch its parent. Thus, the `NoStorage` tests were failing
  or very slow because of many failed requests, with this change.
jbearer added a commit that referenced this issue Nov 14, 2024
This changes leaf fetching to require already having the _next_
leaf. This tells us what the hash should be for the leaf being fetched.

This addresses two unrelated issues:
1. Leaf fetching is trusted: we were not previously verifying that
   the leaf returned by a peer is valid in any way. Now, we can verify
   the fetched leaf against the expected hash (and similarly for the
   fetched QC). We exploit the chaining property of HotShot leaves to
   avoid having to run any kind of consensus light client to verify
   fetched leaves.
2. Fetching leaves by hash instead of (or in addition to) by height
   allows us to implement more providers. For example, we can now
   implement a provider that pulls leaves from undecided consensus
   storage, by hash, which allows us to fetch a leaf from our own
   storage even if we missed the corresponding decide event.

Knock-on changes:
* Proactive scanning now runs backwards, since leaf fetching becomes
  kind of inherently backwards. This was a change we wanted anyways
  (closes #620)
* To facilitate that, we have added reverse streams for all the
  resource types, which may come in handy for the explorer
* Receiving a leaf (either by fetching or decide event) now triggers
  us to fetch the parent if it is missing. This is supplements the
  proactive fetcher and can often help us obtain missing data really
  quickly (e.g. if we just missed one decide)
* `NoStorage` is gone. The purpose of `NoStorage` was to stress test
  fetching, which it did in the beginning, but it has been ages since
  we found a real bug this way; we now have plenty of other adversarial
  fetching tests, and `NoStorage` is becoming more trouble than it's
  worth to maintain. In particular, effective fetching now depends on
  having somewhat reliable storage (a reasonable assumption!) because
  we assume if you fetch a leaf, you can look it up shortly thereafter
  and thus fetch its parent. Thus, the `NoStorage` tests were failing
  or very slow because of many failed requests, with this change.
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant