Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Thread Safety #8

Closed
Techcable opened this issue May 30, 2020 · 2 comments
Closed

Thread Safety #8

Techcable opened this issue May 30, 2020 · 2 comments
Assignees
Milestone

Comments

@Techcable
Copy link
Member

Right now the simple collector is "thread-safe" in that everything is !Send. It's not possible to send garbage collected pointers across threads, so no synchronization is ever needed.

As it stands, the API is pretty agnostic to whether or not the underlying implementation is thread-safe. It exects thread safety to be marked by the appropriate Send and Sync implementations.

The main issue with allowing multiple threads to use the collector is that a collection would have to pause all threads. Once we decide its time to collect, we'd have to block until all other collectors reach a `safepoint!'. This is done cooperatively (like all of zerogc's safepoints), meaning that a single thread could hold up collection by waiting too long to invoke a safepoint.

Internally other VMs, they just don't have an issue with uncooperative threads, since the VM automatically inserts safepoints. The JVM's JIT inserts a safepoint on every single loop iteration (implemented as a test against a guard page).

This is almost analogous to the tradeoff between cooperative and preemptive multitasking. ZeroGC is designed as a low-cost abstraction. We already rely on the user to insert regular safepoints in the single-threaded case, so it's reasonable to expect them to be responsible in multithreaded case as well.

Currently, ZeroGC safepoints are expected to return very quickly if there's not a collection (usually just checking memory usage).
The ZeroGC API should be updated to allow a user to "freeze" a GcContext at a safepoint. This would prevent future allocations (until unfrozen), ensuring that the context's set of roots is known. This is essentially a long-term safepoint, that would other threads to collect while the first set is still working.

The simple collector will probably need a little bit of internal synchronization for this (we'll add a feature-flag to control threading support). We'll need to support thread-local caches for the small-object-arenas. A good experiment would be attempting to reimplement the binary-trees example using multiple threads.

This is basically a requirement for DuckLogic. Although Python is essentially single-threaded (due to the GIL), it's still technically possible for python code to be invoked from different threads.

@Techcable Techcable self-assigned this May 30, 2020
@Techcable Techcable added this to the 0.1.0 milestone May 30, 2020
Techcable added a commit that referenced this issue Jun 1, 2020
This is enabled with the feature flag "sync". Both the free-list and the
current chunk use lock free atomic primitives. We fallback to locking only
when allocating a new chunk.
Thread safety adds about 2 or 3 seconds to binary_trees performance (about 7%).

We don't use per-thread buffers, so this adds exactly zero memory overhead.
A more performance-optimized generational garbage collector would probably
do that. For now, I'm just working on getting DuckLogic up and running.

This is a good first step towards #8
However, we still need to make the safepoint system thread safe.
Techcable added a commit that referenced this issue Jun 2, 2020
All contexts must be blocked at a safepoint before collection
can begin.

This is thread safe as long as you have feature="sync" set.
This will allows for one collector context per thread.
Having more than one is silly, since the first one would block
and you could never invoke the second one (on the same thread).

The only remaining step for #8 is to implement a test case.
We should be thread-safe in *theory*, but concurrency is hard
so I want to test it out before I actually close the issue.

I also implement Send+Sync for Gc<T> where T: Sync.
This is only enabled for the single-threaded collector,
since I'm worried about other threads trigger undefined
behavior by reading from non-atomic types.
It shouldn't be an issue, since Deref only accesses the pointer
and not the header. Still I'm suspicious enough to leave it alone.
@Techcable
Copy link
Member Author

Our current implementation seems to result in a performance regression for single-threaded code.

For the single-threaded binary_trees (n=21):
As of 7741065 we have 34 s + 385 MB
As of 0c190e5 we have 47 s + 385 MB

Thread safety is incomplete and the current implementation deadlocks. I have put no effort into optimization yet, and I bet there's probably plenty of low-hanging fruit.

@Techcable
Copy link
Member Author

It turns out that using roots with frozen collectors was unsound. I suspect this is why the parallel implementation was crashing with optimizations. See 9ad529d for details.

As of dc30ec0 we should be thread-safe, aside from the frozen roots (which I removed next commit).

I'll have to implement an Object Handle interface as a replacement (#9). This was needed for DuckLogic anyways.

Techcable added a commit that referenced this issue Jun 21, 2020
I'd say this pretty much fixes #9
This was causing use after free.

The parallel binary_trees example now produces the correct output (usually)!
We're well on our way to #8
It appears we're still getting some segfaults since GcHandleList doesn't appear
to be properly tracing -_-

We should definitely look into lazy sweeping (#7). The parallel example
is wasting a lot of time sweeping (mostly free) areans!

However, it's output in the wrong order.
Also for some reason it doesn't terminate if we enable
the "small-object-arenas" feature. It looks like the sweep
phase is taking way too long.....
Should we work on lazy free?
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

1 participant