Memory blowup with synthetic test: mimalloc 2 uses all available memory, mimalloc 3 ok #1001

JochenBaier · 2025-02-03T08:16:19Z

In a real application, I had a steady increase in memory with mimalloc. This was the case when the application tried to connect to an unreachable TCP/IP socket every 10 seconds overnight. No increase when mimalloc was disabled. ASAN leak test under Linux ok.
Unfortunately, I can no longer reproduce the problem with the real application.

However, I was able to create a synthetic test which shows a similar behavior. Private bytes and working set since increases steadily. Tested with 3 computers (Windows 10, 11) with mimalloc 2.1.9, dev2 and dev3. No increase with standard Windows malloc, no increase with mimalloc dev3 branch and jemalloc. ASAN test linux ok.

Chart attached (Core i7-4770, 8 Core, 16 GB RAM, Windows 10)

Test case:

#include <thread>
#include <vector>
#include <cassert>
#include <mutex>
#include <condition_variable>
#include <queue>
#include <random>
#include <atomic>

#include "mimalloc.h"

//#ifdef _DEBUG
//#pragma comment(lib, "D:/Downloads/mimalloc-2.1.9/out/msvc-x64/Debug/mimalloc-static.lib")
//#else
//#pragma comment(lib, "D:/Downloads/mimalloc-2.1.9/out/msvc-x64/Release/mimalloc-static.lib")
//#endif

#ifdef _DEBUG
#pragma comment(lib, "D:/Downloads/mimalloc-dev2/out/msvc-x64/Debug/mimalloc.lib")
#else
#pragma comment(lib, "D:/Downloads/mimalloc-dev2/out/msvc-x64/Release/mimalloc.lib")
#endif


//no memory increase with with dev3
//#ifdef _DEBUG
//#pragma comment(lib, "D:/Downloads/mimalloc-dev3/out/msvc-x64/Debug/mimalloc.lib")
//#else
//#pragma comment(lib, "D:/Downloads/mimalloc-dev3/out/msvc-x64/Release/mimalloc.lib")
//#endif

#ifdef _DEBUG
#pragma comment(lib, "Advapi32.lib")
#endif


static std::atomic_uint64_t g_no_optimize_sum{ 0 };
static std::atomic_uint64_t g_count_allocations{ 0 };
static std::atomic_uint64_t g_count_deallocations{ 0 };
static std::atomic_bool g_first_run{ true };  //allocate more on first run

static int thread_safe_random_number(const int& p_min, const int& p_max)
{
  thread_local std::random_device rd;
  thread_local std::mt19937 generator(rd());
  std::uniform_int_distribution<int> distribution(p_min, p_max);
  const int n = distribution(generator);
  return n;
}

static void allocateMemoryThread(std::queue<void*>& allocatedMemory, std::mutex& memoryMutex, std::condition_variable& memoryCondition)
{
  const size_t totalAllocations = g_first_run ? 1000 : thread_safe_random_number(1, 1000);

  size_t allocated = 0;


  for (size_t i = 0; i < totalAllocations; ++i)
  {
    const size_t allocSize = g_first_run ? 100000 : thread_safe_random_number(1000, 100000);

    void* memory = mi_malloc(allocSize);
    if (!memory)
    {
      printf("out of memory...\n");
      std::terminate();
    }

    ++g_count_allocations;

    *((size_t*)memory) = (totalAllocations * allocSize);

    {//scope
      const std::lock_guard<std::mutex> lock(memoryMutex);
      allocatedMemory.push(memory);
    }

    if (allocated >=1024*1024*100) //max 100 mb
    {
      printf("max per thread reached (no error)..\n");
      break;
    }


    memoryCondition.notify_one();
  }

  {//scope
    const std::lock_guard<std::mutex> lock(memoryMutex);
    allocatedMemory.push(nullptr);
  }
  memoryCondition.notify_one();

}

static void deallocateMemoryThread(std::queue<void*>& allocatedMemory, std::mutex& memoryMutex, std::condition_variable& memoryCondition)
{
  while (true)
  {
    std::unique_lock<std::mutex> lock(memoryMutex);

    memoryCondition.wait(lock, [&allocatedMemory]
      {
        return !allocatedMemory.empty();
      });

    void* memory = allocatedMemory.front();
    allocatedMemory.pop();

    if (memory == nullptr)
    {
      assert(allocatedMemory.empty());
      break;
    }

    g_no_optimize_sum += *((size_t*)memory);

    mi_free(memory);
    ++g_count_deallocations;

  }
}

static void testThread()
{
  std::queue<void*> allocatedMemory;

  std::mutex memoryMutex;
  std::condition_variable memoryCondition;

  std::thread allocatorThread(allocateMemoryThread, std::ref(allocatedMemory), std::ref(memoryMutex), std::ref(memoryCondition));
  std::thread deallocatorThread(deallocateMemoryThread, std::ref(allocatedMemory), std::ref(memoryMutex), std::ref(memoryCondition));

  allocatorThread.join();
  deallocatorThread.join();
}

int main()
{
  const int NUM_THREADS = 100;

  std::vector<std::thread> threads;

  const int max_runs = 10000000;
  int runs = 0;

  while (true)
  {
    ++runs;
    for (int i = 0; i < NUM_THREADS; ++i)
    {
      threads.emplace_back(testThread);
    }

    for (auto& t : threads)
    {
      assert(t.joinable());
      t.join();
    }

    threads.clear();
    if (runs == max_runs)
    {
      break;
    }

    g_first_run = false;
  }

  assert(g_count_allocations == g_count_deallocations);

  return (int)g_no_optimize_sum.load();
}

The text was updated successfully, but these errors were encountered:

JochenBaier · 2025-02-03T12:57:12Z

With mimalloc dev3 the chart looks like this:

daanx · 2025-02-10T01:55:13Z

@JochenBaier : very interesting -- What is the red/blue line? The yellow line is the "commit" right? (not virtual). We have been working on dev3 in particular to better share memory between threads and reduce the overall commit (and particularly on Windows thread pool), so I am happy to see it works for your synthetic test as well. I would still like to mitigate the bad behavior for v1 and v2 as well though so I will look further into it -- this test will be quite helpful to investigate further. (for dev3 we test on some huge service where test runs can take days which is not ideal)

JochenBaier · 2025-02-10T06:40:19Z

Thank you for the response.

What is the red/blue line?

The chart was created with Windows Performance Monitor (saved to csv) with counters: Working Set (blue), Working Set – Private (red) and Private Bytes (yellow).

The scenario for the memory increase in the real application looked like this (I tested what happens if a customer turn of machines during weekend to save energy, or bad network)

around 40 other threads IO bound, timer bound
2 independent threads each doing:
1. create 2 threads
2. 1 thread try to connect to LAN TCP/IP address (machine) but randomly not reachable, timeout 10 sec
3. close these 2 threads if not reachable
4. wait 10 sec
5. goto 1.

Simulation of the not reachable IP address was done with https://jagt.github.io/clumsy/:

reachable for random 1..10 sec
not reachable for random 1..10 sec

I saved the charts for the real application test:

Because of this problem we use standard heap manager for now (with around 20 % lower performance in some cases).

daanx · 2025-02-10T19:02:27Z

Thanks -- I can repro locally on v2 as well and will look into it more. It seems v3 (dev3) is stable though so that may be the way forward for now (although it is still being tuned).

JochenBaier changed the title ~~synthetic test: mimalloc 2 uses all available memory, mimalloc 3 ok~~ Memory blow up with synthetic test: mimalloc 2 uses all available memory, mimalloc 3 ok Feb 3, 2025

JochenBaier changed the title ~~Memory blow up with synthetic test: mimalloc 2 uses all available memory, mimalloc 3 ok~~ Memory blowup with synthetic test: mimalloc 2 uses all available memory, mimalloc 3 ok Feb 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory blowup with synthetic test: mimalloc 2 uses all available memory, mimalloc 3 ok #1001

Memory blowup with synthetic test: mimalloc 2 uses all available memory, mimalloc 3 ok #1001

JochenBaier commented Feb 3, 2025 •

edited

Loading

JochenBaier commented Feb 3, 2025

daanx commented Feb 10, 2025

JochenBaier commented Feb 10, 2025 •

edited

Loading

daanx commented Feb 10, 2025

Memory blowup with synthetic test: mimalloc 2 uses all available memory, mimalloc 3 ok #1001

Memory blowup with synthetic test: mimalloc 2 uses all available memory, mimalloc 3 ok #1001

Comments

JochenBaier commented Feb 3, 2025 • edited Loading

JochenBaier commented Feb 3, 2025

daanx commented Feb 10, 2025

JochenBaier commented Feb 10, 2025 • edited Loading

daanx commented Feb 10, 2025

JochenBaier commented Feb 3, 2025 •

edited

Loading

JochenBaier commented Feb 10, 2025 •

edited

Loading