Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[nv16] dynamic loading of builtin actor bundles #8642

Merged
merged 16 commits into from
May 12, 2022
Merged

Conversation

vyzo
Copy link
Contributor

@vyzo vyzo commented May 11, 2022

See #8625

to be resolved: properly getting repo path and network name for loading; see the TODOs; we could do nothing and keep it as is.

@vyzo vyzo requested review from raulk and arajasek May 11, 2022 19:33
@vyzo vyzo requested a review from a team as a code owner May 11, 2022 19:33
@vyzo vyzo changed the title dynamic loading of builtin actor bundles [nv16] dynamic loading of builtin actor bundles May 11, 2022
Copy link
Member

@raulk raulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Light review pass, approach so far LGTM.

return bundleFilePath, nil
}

func (b *BundleFetcher) fetchURL(url, path string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would it make sense to print a progress bar with loggable format when not TTY? Although defer to Lotus maintainers for hints on preferred UX.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugh, thats hard to do actually!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -12,11 +11,9 @@ import (
func init() {
// preload manifest so that we have the correct code CID inventory for cli since that doesn't
// go through CI
if len(build.BuiltinActorsV8Bundle()) > 0 {
bs := blockstore.NewMemory()
bs := blockstore.NewMemory()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it truly OK to load the bundle into a memory blockstore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here yes, we only need the code CID mapping for the cli process.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Seems a bit wasteful, though. Can't we just load the root CID and decode the manifest? (Could do in a follow-up PR too)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the blockstore in thrown away, so it is just transient memory usage; below meh threshold ;)

Copy link
Member

@raulk raulk May 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loading 30MiB into a blockstore is going to add unnecessary latency to EVERY, single, CLI, command. We don't want that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, fine, but how fix that? I don't see any other way.

Copy link
Contributor Author

@vyzo vyzo May 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plus we are talking a few milliseconds, is it really such a big deal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding a TODO for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any quick improvements that can be made to avoid this. TODO and move on SGTM.

Comment on lines 74 to 100
for i := 0; i < 3; i++ {
resp, err := http.Get(url) //nolint
if err != nil {
if isTemporary(err) {
log.Warnf("temporary error fetching %s: %s; retrying in 1s", url, err)
time.Sleep(time.Second)
continue
}
return xerrors.Errorf("error fetching %s: %w", url, err)
}
defer resp.Body.Close() //nolint

resp, err := http.Get(url) //nolint
if err != nil {
return xerrors.Errorf("error fetching %s: %w", url, err)
}
defer resp.Body.Close() //nolint
if resp.StatusCode != http.StatusOK {
log.Warnf("unexpected response fetching %s: %s (%d); retrying in 1s", url, resp.Status, resp.StatusCode)
time.Sleep(time.Second)
continue
}

if resp.StatusCode != http.StatusOK {
return xerrors.Errorf("error fetching %s: http response status is %d", url, resp.StatusCode)
}
f, err := os.OpenFile(path, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0644)
if err != nil {
return xerrors.Errorf("error opening %s for writing: %w", path, err)
}
defer f.Close() //nolint

f, err := os.OpenFile(path, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0644)
if err != nil {
return xerrors.Errorf("error opening %s for writing: %w", path, err)
}
defer f.Close() //nolint
if _, err := io.Copy(f, resp.Body); err != nil {
return xerrors.Errorf("error writing %s: %w", path, err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should add HTTP resumption support. More often than not what will happen is that the download gets interrupted due to a flaky connection. In that case, we don't want to truncate and restart the download from scratch, but rather resume it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does go-paramfetcher do it?

Copy link
Contributor Author

@vyzo vyzo May 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe, but it shouldnt be much of a problem. It is a single 30M download.

We can consider in a follow up if it turns out to be a problem.

Comment on lines 32 to 67
for av, rel := range build.BuiltinActorReleases {
key := dstore.NewKey(fmt.Sprintf("/builtin-actors/v%d/%s", av, rel))

data, err := ds.Get(ctx, key)
switch err {
case nil:
mfCid, err := cid.Cast(data)
if err != nil {
return result, xerrors.Errorf("error parsing cid for %s: %w", key, err)
}

has, err := bs.Has(ctx, mfCid)
if err != nil {
return result, xerrors.Errorf("error checking blockstore for manifest cid %s: %w", mfCid, err)
}

if has {
actors.AddManifest(av, mfCid)
continue
}

case dstore.ErrNotFound:

default:
return result, xerrors.Errorf("error loading %s from datastore: %w", key, err)
}

// ok, we don't have it -- fetch it and add it to the blockstore
mfCid, err := bundle.FetchAndLoadBundle(ctx, r.Path(), bs, av, rel, netw)
if err != nil {
return result, err
}

if err := ds.Put(ctx, key, mfCid.Bytes()); err != nil {
return result, xerrors.Errorf("error storing manifest CID for builtin-actors vrsion %d to the datastore: %w", av, err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, several problems here:

  1. Users may want to delete their blockstores and reimport chain snapshots (I think this is still a very frequent operation since the splitstore still isn't out of beta). However, these keys in the datastore will prevent the bundles from being reimported again into the blockstores. Users aren't going to delete their datastores because they contain other useful stuff (like keys).
  2. Strictly speaking, I don't think we need any of this, and I'd get rid of this extra moving part.
    • We are already being super bundle-import-happy by importing bundles into memory blockstores on every single CLI command process. I don't understand why the sudden frugality here.
    • Why don't we just reimport every bundle over a blockstore abstraction that first checks if the item exists in the underlying blockstore? (I think the buffered blockstore, or overlay blockstore should do the trick here).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Not a problem, we check whether the manifest cid is in the blockstore and if not we refetch/reload.

So users are free to delete their blockstore and import a snapshot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This is the real blockstore and badger sucks, so we'd keep bloating on every restart.

Granted, these may not be too frequent, but still there is no need to reimport with the logic above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, for 2 this is exactly what we do: for every relevant bundle, we lookup the manifest cid in the datastore.
If it is there, we then check the blockstore to make sure it is there.
If not, or if we don't have it in the datastore, then we just go ahead and fetch/load.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got it. Admittedly it's a bit hard to follow what's happening here and why this is needed (some code comments would help). But this answer satisfies me.

I still prefer a version that imports the bundle over a blockstore abstraction that blocks writes for items that already exist. I think that would give you the same effect we're looking for here, with less moving parts, and being more fail-proof.

That said, if @arajasek is happy with this, I have no objections.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this -- i like the idea of wiring it through a blockstore abstraction, but this gets the job done. Agreed that some comments would help, though, it takes a while to grok this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some comments so that the DI loader is easier to read and follow.

@vyzo vyzo merged commit 8a8fb56 into feat/nv16 May 12, 2022
@vyzo vyzo deleted the feat/nv16-bundles branch May 12, 2022 19:18
@raulk
Copy link
Member

raulk commented May 16, 2022

The final item that was left here was to sever the dependency between filecoin-ffi and the builtin-actors bundle for v7. There are some difficulties in making that happen. @vyzo and I have been discussing offband, and I'm posting my notes here.

  • builtin-actors v7 bundles are produced with synthetic code CIDs; that is, with CIDs of the kind fil/7/storageminer which do not content-hash over the bytecode.
  • That makes the car#LoadCAR() content integrity checks fail.
  • We could either (a) disarm those checks via an option, or (b) produce the bundle with real CIDs, import it to a memory blockstore in Lotus, and write them for real in the underlying blockstore with the synthetic Code CIDs so that ref-fvm is able to resolve them with the CID they are referenced with from the state tree.
  • However, synthetic CIDs use Identity multihashes (i.e. multihashes where the data is inlined in the CID), and Lotus expands such CIDs through the IDStore, without querying the underlying blockstore.
  • Therefore, we find that it's ultimately impossible to put the Wasm bytecode under identity multihashes in Lotus' blockstore.
  • The potential solutions here are:
    1. Change the IDStore so that it does not expand inline CIDs when the codec is Raw.
    2. Wrap the store in a remapping blockstore that holds rewrite rules such that the synthetic v7 CodeCIDs are transformed to their respective content-addressed CodeCIDs before querying the blockstore.
    3. Do the rewrite fvm-side.

We don't like solution (1) because it introduces a system-wide change to address a very specific problem.
Solutions (2) and (3) require deeper changes and the tradeoff is not worth it (see next paragraph).

The realisation here is that supporting v7 bundles with FVM is only needed for testing, and only for the short time before nv16 is live. Once nv16 is live with actors v8, there is no reason we will ever want to run actors v7 with FVM. Hence, it does not make sense to implement a definitive solution for a situation that is only temporary.

In conclusion, we will keep the actors v7 dependency in filecoin-ffi and sever it once nv16 is live in mainnet and we no longer need to run canary tests.

@raulk
Copy link
Member

raulk commented May 16, 2022

It is worth noting that the reason why actors v7 works in Rust is because the Rust library for loading CARs is not performing content integrity checks, and the blockstore is not short-circuiting lookups for identity hashes. So this might be a smell that those Rust counterparts are underdeveloped with respect to their Go equivalents, but that's another story.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants