GC runtime is O(allocations), so it eventually dominates runtime #3209
Labels
A-interpreter
Area: affects the core interpreter
C-enhancement
Category: a PR with an enhancement or an issue tracking an accepted enhancement
I-slow
Impact: Makes Miri even slower than it already is
I have been looking into the discussion we had on #3194. I stuck some code into Miri that prints the fraction of runtime spent in the GC then turned it loose on the
regex
test suite.nextest
is great for this because we get a new interpreter for each test. Some of the very long-running tests spend up to 80% of their runtime in the GC.After some experimentation, I think the root cause is that the Provenance GC runs on a basic block interval, but walks the entire allocation map every time it is run.
Here's a very silly program that exhibits explosive GC work:
My instrumentation indicates this program (which runs in 8 seconds) with the default GC interval spends 27% of its runtime in the GC. Increasing the number of allocations drives that fraction towards 100%. Just bumping it up by a factor of 10 jumps the runtime to 300 seconds and the fraction of time in GC to 80%. A million cold allocations is quite a few, but even with 32-byte buckets that's 32 MB of heap. Hardly a memory-hungry program.
We get explosive runtime because while we're scanning all of memory there's only garbage in a tiny fraction of it. This is the foundation of generational garbage collectors as far as I'm aware (I know precious little about GCs).
I feel like a generational approach is the best way out of this problem. I'm starting to think over this and it might be workable? I'm concerned about the general complexity increase of splitting memory into multiple HashMaps, but if all the complexity can be buried in
MonoHashMap
or ourAllocMap
implementation, maybe it's manageable.The text was updated successfully, but these errors were encountered: