Thread Safety #8

Techcable · 2020-05-30T00:45:54Z

Right now the simple collector is "thread-safe" in that everything is !Send. It's not possible to send garbage collected pointers across threads, so no synchronization is ever needed.

As it stands, the API is pretty agnostic to whether or not the underlying implementation is thread-safe. It exects thread safety to be marked by the appropriate Send and Sync implementations.

The main issue with allowing multiple threads to use the collector is that a collection would have to pause all threads. Once we decide its time to collect, we'd have to block until all other collectors reach a `safepoint!'. This is done cooperatively (like all of zerogc's safepoints), meaning that a single thread could hold up collection by waiting too long to invoke a safepoint.

Internally other VMs, they just don't have an issue with uncooperative threads, since the VM automatically inserts safepoints. The JVM's JIT inserts a safepoint on every single loop iteration (implemented as a test against a guard page).

This is almost analogous to the tradeoff between cooperative and preemptive multitasking. ZeroGC is designed as a low-cost abstraction. We already rely on the user to insert regular safepoints in the single-threaded case, so it's reasonable to expect them to be responsible in multithreaded case as well.

Currently, ZeroGC safepoints are expected to return very quickly if there's not a collection (usually just checking memory usage).
The ZeroGC API should be updated to allow a user to "freeze" a GcContext at a safepoint. This would prevent future allocations (until unfrozen), ensuring that the context's set of roots is known. This is essentially a long-term safepoint, that would other threads to collect while the first set is still working.

The simple collector will probably need a little bit of internal synchronization for this (we'll add a feature-flag to control threading support). We'll need to support thread-local caches for the small-object-arenas. A good experiment would be attempting to reimplement the binary-trees example using multiple threads.

This is basically a requirement for DuckLogic. Although Python is essentially single-threaded (due to the GIL), it's still technically possible for python code to be invoked from different threads.

The text was updated successfully, but these errors were encountered:

This is enabled with the feature flag "sync". Both the free-list and the current chunk use lock free atomic primitives. We fallback to locking only when allocating a new chunk. Thread safety adds about 2 or 3 seconds to binary_trees performance (about 7%). We don't use per-thread buffers, so this adds exactly zero memory overhead. A more performance-optimized generational garbage collector would probably do that. For now, I'm just working on getting DuckLogic up and running. This is a good first step towards #8 However, we still need to make the safepoint system thread safe.

All contexts must be blocked at a safepoint before collection can begin. This is thread safe as long as you have feature="sync" set. This will allows for one collector context per thread. Having more than one is silly, since the first one would block and you could never invoke the second one (on the same thread). The only remaining step for #8 is to implement a test case. We should be thread-safe in *theory*, but concurrency is hard so I want to test it out before I actually close the issue. I also implement Send+Sync for Gc<T> where T: Sync. This is only enabled for the single-threaded collector, since I'm worried about other threads trigger undefined behavior by reading from non-atomic types. It shouldn't be an issue, since Deref only accesses the pointer and not the header. Still I'm suspicious enough to leave it alone.

Techcable · 2020-06-08T18:34:36Z

Our current implementation seems to result in a performance regression for single-threaded code.

For the single-threaded binary_trees (n=21):
As of 7741065 we have 34 s + 385 MB
As of 0c190e5 we have 47 s + 385 MB

Thread safety is incomplete and the current implementation deadlocks. I have put no effort into optimization yet, and I bet there's probably plenty of low-hanging fruit.

Techcable · 2020-06-18T05:32:04Z

It turns out that using roots with frozen collectors was unsound. I suspect this is why the parallel implementation was crashing with optimizations. See 9ad529d for details.

As of dc30ec0 we should be thread-safe, aside from the frozen roots (which I removed next commit).

I'll have to implement an Object Handle interface as a replacement (#9). This was needed for DuckLogic anyways.

I'd say this pretty much fixes #9 This was causing use after free. The parallel binary_trees example now produces the correct output (usually)! We're well on our way to #8 It appears we're still getting some segfaults since GcHandleList doesn't appear to be properly tracing -_- We should definitely look into lazy sweeping (#7). The parallel example is wasting a lot of time sweeping (mostly free) areans! However, it's output in the wrong order. Also for some reason it doesn't terminate if we enable the "small-object-arenas" feature. It looks like the sweep phase is taking way too long..... Should we work on lazy free?

Techcable added the enhancement label May 30, 2020

Techcable self-assigned this May 30, 2020

Techcable added this to the 0.1.0 milestone May 30, 2020

Techcable mentioned this issue Jun 7, 2020

[Ask for help] Is it possible to use with JIT? #12

Open

Techcable closed this as completed in 8034133 Jun 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thread Safety #8

Thread Safety #8

Techcable commented May 30, 2020

Techcable commented Jun 8, 2020

Techcable commented Jun 18, 2020

Thread Safety #8

Thread Safety #8

Comments

Techcable commented May 30, 2020

Techcable commented Jun 8, 2020

Techcable commented Jun 18, 2020