Owning region types

Note: I am *not* proposing any changes to https://github.com/mmtk/mmtk-core/pull/643. The algorithm works, and its style is consistent with other parts of mmtk-core.  I am talking about a more general issue in mmtk-core.

TL;DR: We have `Chunk(Address)`, `Block(Address)`, `Line(Address)` and `Page(Address)`.  They implement `Copy`, and has pointer  semantics, but does not **own** the region they point to.  This approach is not idiomatic in Rust.  It does not prevent two pointers to point to the same region mutably, and the programmers need to [add hard-to-reason-about locks](https://github.com/mmtk/mmtk-core/pull/643#pullrequestreview-1175324025) even when they have `&mut` access.   We may need to introduce a set of owning types that own those regions.

# Background

## Rust ownership and references

Rust has`&T`, `&mut T` and `T`.

`&mut T` is an exclusive (unique) borrowing reference, and it grants read-write access.

By enforcing the borrowing rules, Rust programmers are confident that **if a variable holds a `&mut T`, it is the only variable that refers to the `T`, and it has read-write access to it**.  No other `&mut T`, not even `&T`, could coexist.  For this reason, Rust often advertise as having "fearless concurrency".  If you respect the borrowing rules, there can be no data race.

## Region types

The `Region` trait represents an aligned region of memory of a certain size.  Instances include `Chunk(Address)`, `Block(Address)`, `Line(Address)` and `Page(Address)`.  The PR https://github.com/mmtk/mmtk-core/pull/643 also introduces a new type `Block(NonZeroUsize)`.  All of them are wrappers of addresses.

`Region` instances implement `Copy`.  They can be copied just like C pointers.  And they can be constructed from addresses.  In this way, given an address range, we can iterate through all aligned regions in that range.  Each region is constructed from the address.

```rust
// In mmtk::policy::immix::chunk::Chunk::sweep
for block in self.blocks().filter(/* ... */) {
    // A new Block instance is created in each iteration.
    if !block.sweep(/* ... */) { // Accessing block mutably
        // ...
    }
}
```

As we can see, in each iteration, we create a `Block` instance from an address, and access the block mutably (`sweep`).

This means any thread can gain read-write access to a `Block` without borrowing or getting ownership, and dangers await.

# The problem

## The mysterious lock

When reviewing the PR https://github.com/mmtk/mmtk-core/pull/643, I noticed an interesting piece of code.

```rust
impl Block {
    // ... other methods here
    #[inline(always)]
    pub fn attempt_release<VM: VMBinding>(self, space: &MarkSweepSpace<VM>) -> bool {
        // ...
                    let block_list: *mut BlockList = loop {
                        let list: *mut BlockList = self.load_block_list();
                        (*list).lock();
                        if list == self.load_block_list() {
                            break list;
                        }
                        (*list).unlock();
                    };
                    (*block_list).remove(self);
                    (*block_list).unlock();
        // ...
    }
    // ... other methods here.
}
```

This code snippet tries to remove a `Block` from a `BlockList` (a doubly-linked list), but another thread may have moved the `Block` to another `BlockList` concurrently.  So it has to try to lock the `BlockList` in a loop until it successfully locked the `BlockList` and the `Block` is still in that `BlockList`.

This piece of code puzzled me in two ways.
1.  If we already have a `*mut T`, why there is still a race?  Why don't we have unique access to it?  (OK.  It is pointer, not reference.  But it is still counter-intuitive to Rust programmers.)
2.  What operation could the other thread be performing?  If it is racing, it is racing against something.

To answer question (2), I found the two places that tries to acquire the same lock. One (the shown snippet) is sweeping chunks, and the other is releasing mutators.

During the `Release` stage, the MarkSweep collector does two things concurrently.

a.  For all chunks, find all blocks in it that are unmarked, and release them from their BlockList (the shown snippet).
b.  For all mutators, transfer all blocks from the `available_blocks` BlockList and the `consumed_blocks` BlockList into the `unswept_blocks` BlockList.

When doing both concurrently, each block may be reached from two different places, namely through chunks and through mutators.  If not synchronised properly, it may leave the doubly-linked list in an inconsistent state.  Therefore, a spinlock is added to the `BlockList` struct, and used to prevent this race.  The lock is obtained using the `lock()` method, as shown in the snippet above.

The lock solved the racing problem here.  But I am curious why this problem still occurs in Rust.  Shouldn't Rust protect us from this kind of race?

## Why does this happen?

The root cause is that `Block` implements `Copy` and can be created from addresses.  It has pointer semantics.  It points to a `Block`, but does not represent **ownership**.  So all the ownership-transfer and borrow-checking mechanisms stopped working in this use case.  Any thread can gain mutable access by simply creating a `Block` instance.

The SweepChunk work packet uses `RegionIterator<Block>` over the `Chunk` to gain access.  It bypassed the borrow checker, allowing two threads to hold `&mut BlockList` simultaneously.

## But why we don't need locks in other places?

If the `Chunk` and `Block` types are inherently unsafe, I would expect the PR https://github.com/mmtk/mmtk-core/pull/643 has lock/unlock operations everywhere.  But it's not the case.  `BlockList` is usually accessed without locking.

A `BlockList` represents a list of `Block` of certain size.  It has several mutable methods: `remove()`, `push()`, `pop()`, `append()` and `reset()`.

-   On the fast path of `alloc()`, the mutator tries to `pop()` a block from the `BlockList` in the `available_blocks` array owned by the allocator which is owned by the current mutator.  Since the mutator is calling `pop()` on a `BlockList` it already owns, it shall not cause data race.
-   If that fails, the mutator will try to recycle local blocks (if not doing eager sweeping).  It will `pop()` a `Block` from its local `unswept_blocks` list, sweep it, and retry allocating from that block.  It is still accessing blocks and block lists it owns, so there is still no data race.
-   If that fails, too, the mutator will enter the slow path, and try to acquire blocks from the global pool.  This time, it accesses `MarkSweepSpace::abandoned` which is a `Mutex<AbandonedBlockLists>`.  This forces all mutators to acquire a lock before accessing the lists within, and the lock prevents data race.
-   If that still fails, it will ask the PageResource for fresh pages.  PageResource uses Mutex internally to prevent data race.

Observed from this analysis, we see **our MiMalloc implementation is actually a good example of using ownership to prevent data race**.  The fast path accesses owned data so there can't be data race; the slow path accesses global data protected by Mutex so it prevents data race; once the blocks are transferred from the global pool to the local BlockList, subsequent allocations from that block will be local, and can't have data race, either.

Unfortunately, the ownership model is only in the developers' head, but not reflected in the code.  `Block` implements `Copy`, so once assigned, the original owner may still keep a `Block` instance that represents the same block.  In Rust, this shouldn't happen.  Once the ownership is transferred, the original owner no longer has access to it.  This ensures the global pool cannot give the same `Block` to two different mutators.

Throughout the code, `Block` is treated as a pointer.  Therefore, whether to use `&self` or `&mut self` is only a matter of whether it mutates the `block.0` field of the `Block(NonZeroUsize)` instance.  We can see that `Block::sweep(&self)` actually takes a `&self` as parameter, even though it writes into the block.

I would not disapprove the MiMalloc PR for this, because this is common in MMTk core.  The `immix::Block` is the same.  All of its methods take `&self` parameter, including `immix::Block::sweep(&self)`.

# So what kind of `Block` type should we have?

We need something that represents the ownership of a block.  For this, the `Block` type should not implement `Clone` or `Copy`.  `MarkSweepSpace::acquire_block` should be the only place where `Block` instances can be created, and it creates `Block` from the address returned from PageResource when allocating "fresh" blocks.  `Block` instances are destroyed in `Block::attempt_release(self)` (:thumbup: to the owning `self` parameter), when its underlying pages are released back to the PageResource.  When transferring a `Block` from one `BlockList` to another, or from the global pool to the mutator, Rust's ownership model will ensure no two mutators share the same `Block`, and even no two `BlockList` can hold the same `Block` at the same time.

What about sweeping all blocks in a `Chunk`?  From Rust's point of view, the mutator owns the block, because the `MarkSweepSpace` creates `Block` from the address returned from PageResource, and transfers it to the mutator.  Mutators, however, can abandon blocks so they are transferred to `MarkSweepSpace`.  If each `Block` is owned by either a mutator or `MarkSweepSpace`, I would argue that it is completely unnecessary to sweep blocks by iterating chunks.  Instead, we can simply iterate through each block in each mutator and the `MarkSweepSpace`, see if it is marked, and remove it from its `BlockList`.  That's all the blocks out there.  This can be parallelised by creating one work packet for each mutator, and one more for the global `MarkSweepSpace`.  And we don't need any lock on the `BlockList`, because each `BlockList` is also owned by exactly one mutator (or the `MarkSweepSpace`) and the race is impossible.

# Known issues

Is this approach applicable to other GC algorithms, like Immix?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Owning region types #696

Background

Rust ownership and references

Region types

The problem

The mysterious lock

Why does this happen?

But why we don't need locks in other places?

So what kind of `Block` type should we have?

Known issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Owning region types #696

Description

Background

Rust ownership and references

Region types

The problem

The mysterious lock

Why does this happen?

But why we don't need locks in other places?

So what kind of Block type should we have?

Known issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

So what kind of `Block` type should we have?