Description
This tracks leftover issues with our mimalloc allocator and the mark sweep plan that uses it.
Zeroing
Currently zeroing happens when we allocate in a cell, that is very fine grained zeroing. We should use a more coarse grained zeroing. For example, we can do zeroing when we sweep the whole block, and when we sweep contiguous cells.
Min heap
The min heap of the current implementation (based on #643) is:
benchmark | marksweep | marksweep (without abandoning blocks in GC) | non-moving immix with 64K block | semispace |
---|---|---|---|---|
dacapo2006-antlr | 37 | 38 | 20 | 10 |
dacapo2006-luindex | 57 | 65 | 24 | 12 |
dacapo2006-lusearch | 92 | 186 | 25 | 15 |
dacapo2006-pmd | 97 | 97 | 65 | 52 |
dacapo2006-xalan | 66 | 102 | 69 | 67 |
Mimalloc uses thread-local free lists. So instead of having a global free list for each size class, mimalloc allocator has a thread-local free list for each size class. In the original implementation, an allocator always allocate from its own free list, not others. And we do the same in our GC'd version: GC identifies live objects, and let each allocator sweep their dead cells. In this case, if an allocator has recyclable blocks for certain size classes and it does not use it frequently, other allocators cannot reuse those blocks, but have to allocate new blocks from the global memory.
We solved part of the problem by abandoning blocks in GC. So for each GC, each allocator will abandon their local block lists, and return the blocks back to the global pool (in the same way as if those threads die). So for new allocation, allocators will first attempt to get blocks from the global pool. This helps the min heap (from Column2 to Column1).
However, in comparison with non-moving Immix, we are still worse. We will need to investigate more.