Skip to content

Improve nogc error when out of space #1175

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

eileencodes
Copy link
Contributor

When using NoGC if the machine runs out of memory, the user will see an error that says garbage collection failed:

ERROR: An MMTk GC thread panicked.  This is a bug.
panicked at /Users/eileencodes/src/github.com/open_source/mmtk-core/src/plan/nogc/global.rs:74:9:
internal error: entered unreachable code: GC triggered in nogc
Backtrace is disabled.
run with `RUST_BACKTRACE=1` environment variable to display a backtrace
running file: test/zlib/test_zlib.rb

I added a check in poll to see if the plan being used supports GC and if not, raise an error. I could have updated the schedule_collection function to return this error, but I felt that since that function is regarding scheduling, it was better to raise if any plan does not support GC than change the purpose of schedule_collection. We still want to know if a GC ended up being triggered in schedule_collection.

The new error looks like this:

[2024-07-24T18:43:45Z INFO  mmtk::memory_manager] Initialized MMTk with NoGC (FixedHeapSize(1073741824))
thread '<unnamed>' panicked at /Users/eileencodes/src/github.com/open_source/mmtk-core/src/util/heap/gc_trigger.rs:80:17:
internal error: entered unreachable code: User ran out of space and the plan does not support collecting garbage.
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
fatal runtime error: failed to initiate panic, error 5
running file: test/zlib/test_zlib.rb

Note: I've written very little rust so while I know this works (tested in CRuby) I don't know if from a design perspective this is correct.

When using `NoGC` if the machine runs out of memory, the user will see
an error that says garbage collection failed:

```
ERROR: An MMTk GC thread panicked.  This is a bug.
panicked at /Users/eileencodes/src/github.com/open_source/mmtk-core/src/plan/nogc/global.rs:74:9:
internal error: entered unreachable code: GC triggered in nogc
Backtrace is disabled.
run with `RUST_BACKTRACE=1` environment variable to display a backtrace
running file: test/zlib/test_zlib.rb
```

I added a check in `poll` to see if the plan being used supports GC and
if not, raise an error. I could have updated the `schedule_collection`
function to return this error, but I felt that since that function is
regarding scheduling, it was better to raise if any plan does not
support GC than change the purpose of `schedule_collection`. We still
want to know if a GC ended up being triggered in `schedule_collection`.

The new error looks like this:

```
[2024-07-24T18:43:45Z INFO  mmtk::memory_manager] Initialized MMTk with NoGC (FixedHeapSize(1073741824))
thread '<unnamed>' panicked at /Users/eileencodes/src/github.com/open_source/mmtk-core/src/util/heap/gc_trigger.rs:80:17:
internal error: entered unreachable code: User ran out of space and the plan does not support collecting garbage.
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
fatal runtime error: failed to initiate panic, error 5
running file: test/zlib/test_zlib.rb
```
@eileencodes eileencodes changed the title Improve nogc error Improve nogc error when out of space Jul 24, 2024
@wks
Copy link
Collaborator

wks commented Jul 25, 2024

Yes. It makes sense to let it exit immediately when a mutator thread has an allocation failure. The previous code relies on the unreachable!() macro in schedule_collection executed on a GC thread, and the error message appeared as if it were a bug in MMTk while it is just normal for NoGC.

I made some slight changes. I retained the info! log in poll so that when we see the number of pages it prints is too small, we know we need to increase the heap size. I also changed unreachable! to panic! because it is technically not "unreachable code". unreachable! is intended to guard code paths that shouldn't be executed unless there is a bug in the code, but running out of memory is a run-time error.

Copy link
Member

@qinsoon qinsoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wks
Copy link
Collaborator

wks commented Jul 25, 2024

There are two additional problems.

  1. We currently do mock testing using MockVM, and it hooks the Collection::block_for_gc call back function to identify if the mutator triggered GC in a code snippet. If we panic early in GCTrigger::poll, it will not reach block_for_gc.
  2. We actually have an out-of-memory handler Collection::out_of_memory. Its default behavior is panicking, but it allows the VM binding to override it in order to handle the event of out-of-memory in a VM-specific manner. For example, JVM should throw OutOfMemoryError. Ruby doesn't seem to have a hard heap size limit, but only a heap growth factor, and the operating system will start killing innocent processes after a Ruby program allocates too many large objects (which have buffers allocated by malloc).

I think we need to think more about this topic, i.e. out-of-memory handling in NoGC. Intuitively, if we run out of memory in NoGC, then Collection::out_of_memory should be called, which defaults to panicking.

@qinsoon
Copy link
Member

qinsoon commented Jul 25, 2024

We can just rewrite related tests. I am also okay if you would like to call out_of_memory for NoGC.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants