Skip to content

Keep the Store alive until its StoreContext is no longer used #206

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 1 commit into from
Jan 24, 2023

Conversation

kpreisser
Copy link
Contributor

@kpreisser kpreisser commented Dec 17, 2022

Keep the Store alive until its StoreContext is no longer used.

As noted in #200 (comment), this is required (to handle the case when you forget to dispose the Store) as otherwise it could theoretically happen that the Store handle is deleted (with wasmtime_store_delete) in the GC finalizer thread even when at the same time a native call using the StoreContext handle is still executing in another thread, in case the Store object is no longer reachable.

For native methods taking a handle parameter that is passed as SafeHandle, this is not required as the SafeHandle is already kept alive during the call.
An exception is if you use SafeHandle.DangerousGetHandle() to retrieve the handle as pointer value and pass it; in that case you must also keep the SafeHandle alive.

This commit also fixes an instance of the above noted issue in Engine.IncrementEpoch(), where wasmtime_engine_increment_epoch was declared as taking an IntPtr handle, and Engine.IncrementEpoch() used SafeHandle.DangerousGetHandle() without keeping the SafeHandle alive (this appears to have been introduced with #118).

Note that I also added the GC.KeepAlive(store) if the store is used in method calls after that, because otherwise it is not guaranteed that the following call will actually keep the store alive (the call could be inlined, and if then doesn't read the Store, it would still be eligible for GC); additionally, this would cause a "code debt" if you edit the code in the future, as e.g. when you would change/remove the code after the call using the StoreContext, you would have to remember to add the GC.KeepAlive(store) call.


Note: A general question that came to my mind, is whether Wasmtime actually has a thread-safe resource management, i.e. whether it supports that one thread may be freeing a resource, while at the same time another thread is calling a native method that uses another value that internally might contain a reference to the resource freed by the first thread.

For example, there is the Engine object (wasm_engine_t) that you need to create first, and then you can create a Store (wasmtime_store_t*) from the Engine (as wasmtime_store_new() takes a wasm_engine_t* parameter), from which I understand that internally, the returned wasmtime_store_t may contain a reference to the wasm_engine_t.
(So, if you call wasm_engine_delete to delete the engine while a store still exists, this will not actually free the wasm_engine_t but just decrements its reference count. If you later delete the wasmtime_store_t, then it and the referenced engine will actually be freed.)

However, if in .NET code you create an Engine, and then create a Store from the Engine and then throw the Engine away (and you forget to use a using block or to call Engine.Dispose() afterwards), and later you call Store.Dispose(), it can happen that wasm_engine_delete() is called from the GC finalizer thread, while at the same time your main thread is calling wasmtime_store_delete(). Is this supported by Wasmtime?

(If not, them I'm wondering whether the apporach of releasing handles in finalizers can actually be supported at all by wasmtime-dotnet.)

What do you think?

Thanks!

As noted in bytecodealliance#200, this is required as otherwise it could theoretically happen that the `Store` handle is deleted (with `wasmtime_store_delete`) in the GC finalizer thread even when at the same time a native call using the `StoreContext` handle is still executing in another thread, in case the `Store` object is no longer reachable.

For native methods taking a handle parameter that is passed as `SafeHandle`, this is not required as the `SafeHandle` is already kept alive during the call.
An exception is if you use `SafeHandle.DangerousGetHandle()` to retrieve the handle as pointer value and pass it; in that case you must also keep the `SafeHandle` alive.

This commit also fixes an instance of the above noted issue in `Engine.IncrementEpoch()`, where `wasmtime_engine_increment_epoch` was declared as taking an `IntPtr` handle, and `Engine.IncrementEpoch()` used `SafeHandle.DangerousGetHandle()` without keeping the SafeHandle alive (this appears to have been introduced with bytecodealliance#118).
@peterhuene
Copy link
Member

Apologies on the delay for reviewing this as I was on vacation in December and I've been focusing a lot on tooling unrelated to the .NET bindings lately.

I'll read up on the context and review the changes shortly.

@peterhuene
Copy link
Member

A general question that came to my mind, is whether Wasmtime actually has a thread-safe resource management, i.e. whether it supports that one thread may be freeing a resource, while at the same time another thread is calling a native method that uses another value that internally might contain a reference to the resource freed by the first thread.

Wasmtime follows Rust's threading model where Store<T> (here T represents the user context data) is Send if T is Send and Sync if T is Sync, meaning that the context type determines whether or not the store can be sent and synchronized between threads.

It further relies on Rust to maintain a guarantee that Store-related objects cannot outlive the Store itself; this is accomplished by requiring a reference to the store be passed into any operation on a Store-related object; thus the Rust compiler can statically verify that a Store outlives its use by the Store-related objects.

Some types, like Engine and Module, are inherently Send and Sync and can be used safely from multiple threads; this includes deleting them as references to these are maintained via a synchronized reference count. Basically the only thing the .NET API should be doing from the finalizer thread is decrementing these reference counts.

The real problem with the .NET API is that it doesn't uphold the same guarantees with Store:

  • There's no way to automatically enforce safety for store context data (i.e. the T from above); if users supply their own context data then they must ensure it can be sent between threads if they send the Store between threads and synchronized if more than one thread can mutate the context data. This can only be solved with documentation/guidance from the .NET API.

  • The .NET API doesn't uphold the same guarantee regarding the use of Store-related objects that the Rust API does; Store-related objects in the .NET API hold strong references on the Store they come from (albeit in a way that makes it not possible to keep around Caller-related objects past a host call, currently). The upside of this is that it makes the API more ergonomic in terms of not having to pass a reference to the Store for every operation on a Store-related object. The downside is that it is possible to explicitly Dispose a Store while it is still in use by a related object, which is unsafe.

There's two possible solutions in my mind for the second issue:

  • We simply document that users of the API must guarantee that Store is not disposed until after every Store-related object is no longer usable. It's not ideal, but also not the most arduous of requirements given the .NET API is effectively an interop shim. For most use cases, I imagine this is very easy to maintain as one would create a Store, interact with it, and then dispose it without involving other threads.

  • Go back to explicitly accepting the Store to use for operations on Store-related objects and no longer keep a strong reference on the Store from the related objects. This guarantees that the Store remains alive for the duration of the requested operation. While not as ergonomic, this would more closely model how things are done in Wasmtime itself.

I haven't yet looked over the PR, but I am, on the surface, wary of injecting GC.KeepAlive in lots of places. I'll have to dig deeper and see what's necessary.

@peterhuene
Copy link
Member

I followed up with #200 to say that I think that storing a weak self-reference in the store context data, as you originally described, would be a good path to move forward on for fixing the issue with Caller retrieved objects being tied to the lifetime of the call and having the least impact on the current API.

I'm going to move forward with this review to fix the store lifetime issues at interop call sites.

Copy link
Member

@peterhuene peterhuene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good and thanks for being so thorough!

@peterhuene peterhuene merged commit 08d9578 into bytecodealliance:main Jan 24, 2023
@kpreisser kpreisser deleted the keep-store-alive branch January 24, 2023 08:26
kpreisser added a commit to kpreisser/wasmtime-dotnet that referenced this pull request Jan 24, 2023
peterhuene added a commit that referenced this pull request Jan 24, 2023
* Allow to specify functions using untyped callbacks, which allows to specify a callback of any type without the need to use a specific delegate type.

The untyped callback will receive arguments and can set results as a span of ValueBox.

For example, this allows to define trapping functions for a module's import without having to know the function time at compile-time, similar to Wasmtime's define_unknown_imports_as_traps.

* Follow-Up: Initialize the result ValueBoxes with their expected ValueKind but a default value, as otherwise they all would be initialized with ValueKind.Int32.

* Follow-Up: Pull the (int) cast up, to align with InvokeCallback().

* Fix wording.

* Enhance the tests to also check the ValueKind.

* Fix code style.

Co-authored-by: Peter Huene <peter@huene.dev>

* Add missing GC.KeepAlive() call (which corresponds to the changes from #206).

* Apply the changes from #202 also for the two remaining overloads of Linker.DefineFunction, to reduce allocations for the strings.

* Follow-Up: Adjust for the changes in #211.

Co-authored-by: Peter Huene <peter@huene.dev>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants