-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[nv16] dynamic loading of builtin actor bundles #8642
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Light review pass, approach so far LGTM.
chain/actors/bundle.go
Outdated
return bundleFilePath, nil | ||
} | ||
|
||
func (b *BundleFetcher) fetchURL(url, path string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: would it make sense to print a progress bar with loggable format when not TTY? Although defer to Lotus maintainers for hints on preferred UX.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ugh, thats hard to do actually!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -12,11 +11,9 @@ import ( | |||
func init() { | |||
// preload manifest so that we have the correct code CID inventory for cli since that doesn't | |||
// go through CI | |||
if len(build.BuiltinActorsV8Bundle()) > 0 { | |||
bs := blockstore.NewMemory() | |||
bs := blockstore.NewMemory() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it truly OK to load the bundle into a memory blockstore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here yes, we only need the code CID mapping for the cli process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. Seems a bit wasteful, though. Can't we just load the root CID and decode the manifest? (Could do in a follow-up PR too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the blockstore in thrown away, so it is just transient memory usage; below meh threshold ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Loading 30MiB into a blockstore is going to add unnecessary latency to EVERY, single, CLI, command. We don't want that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, fine, but how fix that? I don't see any other way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plus we are talking a few milliseconds, is it really such a big deal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding a TODO for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any quick improvements that can be made to avoid this. TODO and move on SGTM.
node/bundle/bundle.go
Outdated
for i := 0; i < 3; i++ { | ||
resp, err := http.Get(url) //nolint | ||
if err != nil { | ||
if isTemporary(err) { | ||
log.Warnf("temporary error fetching %s: %s; retrying in 1s", url, err) | ||
time.Sleep(time.Second) | ||
continue | ||
} | ||
return xerrors.Errorf("error fetching %s: %w", url, err) | ||
} | ||
defer resp.Body.Close() //nolint | ||
|
||
resp, err := http.Get(url) //nolint | ||
if err != nil { | ||
return xerrors.Errorf("error fetching %s: %w", url, err) | ||
} | ||
defer resp.Body.Close() //nolint | ||
if resp.StatusCode != http.StatusOK { | ||
log.Warnf("unexpected response fetching %s: %s (%d); retrying in 1s", url, resp.Status, resp.StatusCode) | ||
time.Sleep(time.Second) | ||
continue | ||
} | ||
|
||
if resp.StatusCode != http.StatusOK { | ||
return xerrors.Errorf("error fetching %s: http response status is %d", url, resp.StatusCode) | ||
} | ||
f, err := os.OpenFile(path, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0644) | ||
if err != nil { | ||
return xerrors.Errorf("error opening %s for writing: %w", path, err) | ||
} | ||
defer f.Close() //nolint | ||
|
||
f, err := os.OpenFile(path, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0644) | ||
if err != nil { | ||
return xerrors.Errorf("error opening %s for writing: %w", path, err) | ||
} | ||
defer f.Close() //nolint | ||
if _, err := io.Copy(f, resp.Body); err != nil { | ||
return xerrors.Errorf("error writing %s: %w", path, err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should add HTTP resumption support. More often than not what will happen is that the download gets interrupted due to a flaky connection. In that case, we don't want to truncate and restart the download from scratch, but rather resume it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does go-paramfetcher do it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe, but it shouldnt be much of a problem. It is a single 30M download.
We can consider in a follow up if it turns out to be a problem.
node/modules/builtin_actors.go
Outdated
for av, rel := range build.BuiltinActorReleases { | ||
key := dstore.NewKey(fmt.Sprintf("/builtin-actors/v%d/%s", av, rel)) | ||
|
||
data, err := ds.Get(ctx, key) | ||
switch err { | ||
case nil: | ||
mfCid, err := cid.Cast(data) | ||
if err != nil { | ||
return result, xerrors.Errorf("error parsing cid for %s: %w", key, err) | ||
} | ||
|
||
has, err := bs.Has(ctx, mfCid) | ||
if err != nil { | ||
return result, xerrors.Errorf("error checking blockstore for manifest cid %s: %w", mfCid, err) | ||
} | ||
|
||
if has { | ||
actors.AddManifest(av, mfCid) | ||
continue | ||
} | ||
|
||
case dstore.ErrNotFound: | ||
|
||
default: | ||
return result, xerrors.Errorf("error loading %s from datastore: %w", key, err) | ||
} | ||
|
||
// ok, we don't have it -- fetch it and add it to the blockstore | ||
mfCid, err := bundle.FetchAndLoadBundle(ctx, r.Path(), bs, av, rel, netw) | ||
if err != nil { | ||
return result, err | ||
} | ||
|
||
if err := ds.Put(ctx, key, mfCid.Bytes()); err != nil { | ||
return result, xerrors.Errorf("error storing manifest CID for builtin-actors vrsion %d to the datastore: %w", av, err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, several problems here:
- Users may want to delete their blockstores and reimport chain snapshots (I think this is still a very frequent operation since the splitstore still isn't out of beta). However, these keys in the datastore will prevent the bundles from being reimported again into the blockstores. Users aren't going to delete their datastores because they contain other useful stuff (like keys).
- Strictly speaking, I don't think we need any of this, and I'd get rid of this extra moving part.
- We are already being super bundle-import-happy by importing bundles into memory blockstores on every single CLI command process. I don't understand why the sudden frugality here.
- Why don't we just reimport every bundle over a blockstore abstraction that first checks if the item exists in the underlying blockstore? (I think the buffered blockstore, or overlay blockstore should do the trick here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Not a problem, we check whether the manifest cid is in the blockstore and if not we refetch/reload.
So users are free to delete their blockstore and import a snapshot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This is the real blockstore and badger sucks, so we'd keep bloating on every restart.
Granted, these may not be too frequent, but still there is no need to reimport with the logic above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, for 2 this is exactly what we do: for every relevant bundle, we lookup the manifest cid in the datastore.
If it is there, we then check the blockstore to make sure it is there.
If not, or if we don't have it in the datastore, then we just go ahead and fetch/load.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah got it. Admittedly it's a bit hard to follow what's happening here and why this is needed (some code comments would help). But this answer satisfies me.
I still prefer a version that imports the bundle over a blockstore abstraction that blocks writes for items that already exist. I think that would give you the same effect we're looking for here, with less moving parts, and being more fail-proof.
That said, if @arajasek is happy with this, I have no objections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with this -- i like the idea of wiring it through a blockstore abstraction, but this gets the job done. Agreed that some comments would help, though, it takes a while to grok this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added some comments so that the DI loader is easier to read and follow.
The final item that was left here was to sever the dependency between filecoin-ffi and the builtin-actors bundle for v7. There are some difficulties in making that happen. @vyzo and I have been discussing offband, and I'm posting my notes here.
We don't like solution (1) because it introduces a system-wide change to address a very specific problem. The realisation here is that supporting v7 bundles with FVM is only needed for testing, and only for the short time before nv16 is live. Once nv16 is live with actors v8, there is no reason we will ever want to run actors v7 with FVM. Hence, it does not make sense to implement a definitive solution for a situation that is only temporary. In conclusion, we will keep the actors v7 dependency in filecoin-ffi and sever it once nv16 is live in mainnet and we no longer need to run canary tests. |
It is worth noting that the reason why actors v7 works in Rust is because the Rust library for loading CARs is not performing content integrity checks, and the blockstore is not short-circuiting lookups for identity hashes. So this might be a smell that those Rust counterparts are underdeveloped with respect to their Go equivalents, but that's another story. |
See #8625
to be resolved: properly getting repo path and network name for loading; see the TODOs; we could do nothing and keep it as is.