-
-
Notifications
You must be signed in to change notification settings - Fork 31.3k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
gh-112532: Improve mimalloc page visiting #114133
Conversation
This adds support for visiting abandoned pages in mimalloc and improves the performance of the page visiting code. Abandoned pages contain memory blocks from threads that have exited. At some point, they may be later reclaimed by other threads. We still need to visit those pages in the free-threaded GC because they contain live objects. This also reduces the overhead of visiting mimalloc pages: * Special cases for full, empty, and pages containing only a single block. * Fix free_map to use one bit instead of one byte per block. * Use fast integer division by a constant algorithm when computing block offset from block size and index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious about the unused function(s), otherwise LGTM!
} | ||
|
||
// Visit all blocks in a abandoned segments | ||
bool _mi_abandoned_pool_visit_blocks(mi_abandoned_pool_t* pool, uint8_t page_tag, bool visit_blocks, mi_block_visit_fun* visitor, void* arg) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This (and therefore the previous 2 functions) doesn't seem to be used anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These will be used in the upcoming GC PR. Here is an example usage:
I put them in this PR because:
- Keeping the mimalloc changes separate makes them a bit easier to track and upstream
- The GC PR will be big and doing this first makes the upcoming PR a bit smaller
@DinoV, would you please merge this when you are ready? |
This adds support for visiting abandoned pages in mimalloc and improves the performance of the page visiting code. Abandoned pages contain memory blocks from threads that have exited. At some point, they may be later reclaimed by other threads. We still need to visit those pages in the free-threaded GC because they contain live objects. This also reduces the overhead of visiting mimalloc pages: * Special cases for full, empty, and pages containing only a single block. * Fix free_map to use one bit instead of one byte per block. * Use fast integer division by a constant algorithm when computing block offset from block size and index.
This adds support for visiting abandoned pages in mimalloc and improves the performance of the page visiting code. Abandoned pages contain memory blocks from threads that have exited. At some point, they may be later reclaimed by other threads. We still need to visit those pages in the free-threaded GC because they contain live objects. This also reduces the overhead of visiting mimalloc pages: * Special cases for full, empty, and pages containing only a single block. * Fix free_map to use one bit instead of one byte per block. * Use fast integer division by a constant algorithm when computing block offset from block size and index.
…3.0.1 ArtSin (1): Fix int and long handling and the use of (u)intptr_t in _mi_vsnprintf Daan (65): bump version to v1.8.8 for further development typo in stress test fix pthread initalization of mutexes fix c++ compilation decrease meta allocation zone to 4k (to reduce .bss) increase thread data cache to32 entries whitespace remove old mi_abandoned_await_readers re-enable tsan test in azure pipelines add reference to page_malloc_zero in C++ build increase iterations for tsan test reduce UBSAN parameters to stay within pipeline limits rename arena-abandoned to arena-abandon update aligned documentation add js for docs add search js files for docs add docs svg's add heap tag to area descriptor update docs update docs fix count/size order in mi_heap_alloc_new_n, issue #906 initial work on guarded objects fix UINT32_MAX constant (see issue #913) set lower parameters for guarded test add guarded build to test pipeline increase test timeout for azure pipeline increase TSAN test to 400 iterations add cmake option to add C pre processor definitions more easily allow certain options to have defaults set via the pre-processor at build time -- see issue #945 add test for issue #944 fix MI_EXTRA_CPPDEFS setting reorganize primitives for process initialization; use special data segment on Windows for thread termination by default on Windows now (issue #869) add cmake option to fall back on the fiber api do detect thread termination on windows fix build on windows fix duplicate definition on windows fix win32 compilation fix fast divisor for 32-bit platforms cleanup process init/done fix issue where searching for abandoned blocks would skip the first one add missing mi_thread_done definition improve windows static library initialization to account for thread local destructors (issue #944) fix assertion check do not reclaim segments if free-ing from a thread with an already abandoned heap (issue #944) update mimalloc redirect to v1.2 to handle static destructors that free memory (issue #944) update mimalloc-redirect update comments, set constructor priority to 101 on macOS add 0 byte to canary to prevent spurious read overflow to read the canary (issue #951, pr #953) disable aligned hinting or SV39 mmu's, issue #939, and pr #949 remove wrong assertion update test file update mimalloc-redirect to potentially fix issue #957 allow build time setting of sample rate small fixes for macOS various fixes for test pipeline fix debug build of MI_GUARDED fix missing void fix macos 15 OS name temporarily add macOS 13 and 12 for testing fix for macOS 14 and earlier use non-null tld in heap_init fix assertion fix TLS slot on macOS add neon code for bit clear add neon version for chunk_is_clear Update readme.md to fix links (issue #978) Daan Leijen (61): add initial primitive api for locks move lock code to atomic.h fix warnings shuffle for 128 bit set compile as C++ in VS IDE clean up guarded allocation add comments fix use_guarded signature use enqueue_from_full, and keep inserting at the end fix std malloc compile of the stress test add windows arm64 target to vs2022 add redirection dll for windows on arm64 add minject for windows arm64 add Windows arm64 support in cmame; name the mimalloc dll 'mimalloc-override.dll' on Windows with cmake (to match the IDE and minject update readme update arm64 redirection testing on arm64 make timeout for tests in the pipeline up to 4 min better stats for commit on overcommit systems (by not counting on-demand commit upfront) add support for arm64ec update redirection modules to v1.3 add _base test for redirection update redirection readme fix cmake for visual studio on arm64 update readme for cmake on windows add link for VS generator revert back to generating mimalloc.dll instead of mimalloc-override.dll don't prefer high used candidate if it is too full update IDE settings to match cmake output; in particular mimalloc-override.dll -> mimalloc.dll add updated minject v1.2 that defaults to mimalloc.dll instead of mimalloc-override.dll update readme to use mimalloc.dll (instead of mimalloc-override.dll) fix cmake to generate mimalloc.dll on windows don't override a page candidate with a page that is too full insert full pages at the end of the queue; only override page candidate if the page is not too full fix build fix max va bits on unix fix issue #976 fix initializer warning on clang-18 rename segment_map_destroy to segment_map_unsafe_destroy add filters for vs projects remove older vs projects as they became stale update vs project filter avoid accessing heap->tld after running thread_done (pr #974) fix potentially warning on gcc (pr #935) add newline fix alignment for mi_manage_os_memory (issue #980) add thread_local for c++ disable large pages by default fix signedness warning fix initialization warning on gcc combine flags and xthread_id nicer logic in free merge from dev3-bin update to v1.8.8 bump version to 3.0.0 allow large OS pages on Linux by default (but not on Android) fix link in readme bump version to 3.0.1 for further development bump version to 1.8.9 for further development fix large OS page behaviour on Linux; default is now 2 which only uses large OS pages (not huge) through madvise display full version during cmake Daisuke Fujimura (fd0) (1): Build on cygwin Danny Lin (1): Change macOS mmap tag to fix conflict with IOAccelerator David Carlier (1): _mi_memcpy/_mi_memzero: tighten criteria for intrinsics for windows. Diego Russo (1): Fix illegal instruction for older Arm architectures Ikko Eltociear Ashimine (1): docs: update readme.md Javier Blazquez (1): free segment map when destroy_on_exit is set Jim-Wang (1): fix build error on linux Joris van der Geer (1): readme - describe how to run under Valgrind with dynamic override Michael Neumann (1): Fix build on FreeBSD-derivate DragonFly Philip Brown (1): Musl needs __libc* functions too QuarticCat (1): fix typos Rui Ueyama (1): Add a missing #include Zhihua Lai (1): Fix typo daanx (240): prevent UB in arena reservation fix spelling increase max arenas add support for sub-processes (to supportpython/cpython#113717) add initial support for visiting abandoned segments per subprocess, upstream for python/cpython#114133 add support to visit _all_ abandoned segment blocks per sub-process, upstream for python/cpython#114133 optimize heap walks, by Sam Gross, upstream of python/cpython#114133 fix leak in abandoned block visiting only reclaim for exclusive heaps in their associated arena revise the segment map to only apply to OS allocated segments and reduce the .BSS footprint fix cast; make segment map static reduce delayed output from redirection to 16KiB to reduce the .bss size use EFAULT if a target heap tag cannot be found on reclaim always include sys/prctl.h on linux to disable THP if large_os_pages are not enabled switch between OS and arena allocation in stress test more aggressive reclaim from free for OS blocks revisit atomic reclaim for abandoned segments push os abandoned blocks at the tail end maintain count of the abandoned os list fix leak where OS abandoned blocks were not always reclaimed refactor arena abandonment in a separate file refactor arena-abandoned to be an include for backward compat with existing build scripts fix vs 2022 ide don't reset a segment thread id when iterating don't reset a segment thread id when iterating fix asan tracking by explicitly setting memory to undefined before a free fix potential race on subproc field in the segment update documentation update doxyfile add extra assertions to check that blocks are always aligned to MI_MAX_ALIGN_SIZE fix alignment test initial working guarded pages fix multi-threaded free to unprotect guarded blocks clean up guarded pages code don't consider memory as large OS pages if only madvise'd prefer pages that do not expand search N pages for a best fit insert full pages that became unfull, at the start of the page queue to increase potential reuse revert back to unfull at the end of queues as it slows down some benchmarks (like alloc-test1) reduce page search to 8 add virtual address bits and physical memory to the mem config add address hint to primitive allocation API update guarded implementation to use block tags rename mi_debug_guarded_ to mi_guarded_ add sampling for guarded objects add guarded objects that are sampled (and fit a size range). guarded sample rate etc can be set per heap as well as defaulted with options fix asan with MI_GUARDED update azure pipeline to use sample rate of 1000 for guarded objects Extend azure pipeline with Ubuntu 24 & 20, windows 2019, and macOS 15 fix azure pipeline add target_segments_per_thread option clean up candidate search; add mi_collect_reduce ensure forced abandoned pages can be accessed after free wip: initial work on mimalloc3 without segments wip: further progress on removing segments wip: further progress on segment removal; arena allocation wip: further progress on segment removal; arena allocation can compile without missing functions wip: update any_set wip: can run initial test wip: bug fixes wip: bug fixes wip: bug fixes wip: add generic find_and_xset wip: rename arena blocks to slices compile with clang and gcc wip first version that passes the make test pass all debug tests bug fixes wip: cannot compile wip: use epoch with 512bit chunks wip: can run mstress fix free stats add base and size to OS memid can run basic test can run the full test suite revise free reclaim; ensure unown cannot race with a free fix assertions increase MAX_OBJ_SLICES to a full chunk (32MiB) wip: initial large bitmaps large bitmaps working; lock on arena_reserve small fixes more documentation; better pairmap find_and_set_to_busy, busy flag is now 0x10 small adjustments change to full_page_retain tune free-ing and abandoning initial no more pairmap working simplified version without pairmaps and bitmap epoch record max_clear bit fix page info size and order; atomic page flags compile for 32-bit as well small fixes Add MI_ARCHOPT option to enable architecture specific optimizations revise visiting arenas, better bitmap scanning Add MI_ARCHOPT support for msvc arch specific optimizations check heaptag on abandonded page allocation specialize bitmap operations for common page sizes check for running in a threadpool to disable page reclaim only enable architecture specific optimization for armv8.1 update bit primitives fix spelling update optimization on haswell delete old files add dedicated meta data allocation for threads and tld comments fix write to empty heap in mi_guarded build remove os_tld and stats parameters to os interface fix bug where only the first chunkmap field would be considered set default arena reserve back to 1GiB various improvements add cast to avoid errors on clang 7 add cast to avoid errors on clang 7 fix 32 bit multiply in generic ctz/clz add bsf/bsr for compilation with older compilers (clang 7) improve generic ctz/clz add extra checks for valid pointers in the pagemap, add max_vabits and debug_commit_full_pagemap options fix generic ctz/clz improve popcount fix MI_GUARDED build better block alignment add asan/ubsan/tsan and valgrind to default debug build heap meta data always uses mi_meta_zalloc ensure incompatible heaps are not absorbed fix comments in types; fix guarded alignment bug small updates fix build error use frac 8 for reclaim_on_free and reabandon; halve full_page_retain if running in a threadpool wip: allow arena (re)loading maintain pages set for arenas; improve arena load/unload space out threads when searching for free pages use thread spacing for reclaim as well use thread spacing for reclaim as well lower full page retain more aggressively in a threadpool fix free bug for meta data add debug output for page map; free tld on thread exit comment nicer debug output wip: start on purge enable purging of free committed slices from arenas clean up bitmap api fix avx2 bug with atomics flexible clearN_ that can start at any index fix concurrent mi_tld access bug small fixes wip: binned bitmap for the free slices more bbin size classes, bug fixes remove maxaccessed from general bitmaps add delay to purg'ing; call collect_retired every N generic allocs comments comments fix infoslices needed calculation fix bug in bitmap_forall_ranges fix purging with ranges atomically clear purge bits when visiting update minject to v1.1 add ajust stats to compensate for double counting adjust stats more clearly to avoid double counting commits adjust stats more clearly to avoid double counting commits update stat adjustment for purging update arch detection in cmake syntax error add comments/doc fix MI_ARCH test add specialized is_set for 1 bit small fixes; max object size 1/8th of a pages remove busy wait for arena reservation use srw lock on windows subprocesses own arena's fix lock recursion make stats part of a subproc merge subproc stats on delete track os abandoned pages in a list allocate heaps associated with an arena in that arena add initial load/unload for heaps update lock primitive; fix arena exclusive allocation remove req_arena parameter to arena_reserve limit purgeing to one purge cycle per purge delay fix build error limit candidate search to 4 merge from dev re-add deferred free and heap retired collect enable collecting from the full page queue fix signed/unsigned; fix heap_destroy assert failure initial work on a two-level page-map fix page_map initialization revert back to flat address map add -mtune=native with opt arch experiment with 2 level pagemap improving level 2 page-map small fixes rename option pagemap_commit; always commit the page map on macos (for now) support full secure build clean up cleanup, some renaming cleanup old purge delay merge from dev3 fix recursive tls access on macOS <= 14 document way to use a TLS slot on windows add abandoned_visit_blocks commit 2level page-map on over-commit systems remove is_large member (and use is_pinned for this) add _mi_os_guard_page_size fix guard page size fix purge delay check for arenas double arena per 4; large page objects 1/8 of large page size max obj size 1/8 of a page commit page on demand improve page commit on demand fix assertion for huge pages fix huge page allocation size fix rounding issue with huge size allocations rename page options nice colors for heap maps remove is_expandable requirement on page candidates fix build warning fix page commit-on-demand setting commit page-map within one allocation wip: merging from upstream improve commit stats small fixes fix debug_show_arenas parameters fix constructor re-initialization on subproc_main fix c++ initializer warning renamed vcxproj add comments about TLS add attr_noexept for better codegen on msvc add declspec hidden to improve codegen on arm64 use fixed tls on windows with static linking merge from dev3 add comments make bitmap scan cross bfields for NX; disable the use of large object pages fix debug output fix scan of NX fix NX test in try_find_and_clearN fix pointer alignment for zero-sized large alignment case search size bins from small to large fix enable large pages
This adds support for visiting abandoned pages in mimalloc and improves the performance of the page visiting code. Abandoned pages contain memory blocks from threads that have exited. At some point, they may be later reclaimed by other threads. We still need to visit those pages in the free-threaded GC because they contain live objects.
This also reduces the overhead of visiting mimalloc pages: