Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Memory blowup with synthetic test: mimalloc 2 uses all available memory, mimalloc 3 ok #1001

Open
JochenBaier opened this issue Feb 3, 2025 · 4 comments

Comments

@JochenBaier
Copy link

JochenBaier commented Feb 3, 2025

In a real application, I had a steady increase in memory with mimalloc. This was the case when the application tried to connect to an unreachable TCP/IP socket every 10 seconds overnight. No increase when mimalloc was disabled. ASAN leak test under Linux ok.
Unfortunately, I can no longer reproduce the problem with the real application.

However, I was able to create a synthetic test which shows a similar behavior. Private bytes and working set since increases steadily. Tested with 3 computers (Windows 10, 11) with mimalloc 2.1.9, dev2 and dev3. No increase with standard Windows malloc, no increase with mimalloc dev3 branch and jemalloc. ASAN test linux ok.

Chart attached (Core i7-4770, 8 Core, 16 GB RAM, Windows 10)

Image

Test case:

#include <thread>
#include <vector>
#include <cassert>
#include <mutex>
#include <condition_variable>
#include <queue>
#include <random>
#include <atomic>

#include "mimalloc.h"

//#ifdef _DEBUG
//#pragma comment(lib, "D:/Downloads/mimalloc-2.1.9/out/msvc-x64/Debug/mimalloc-static.lib")
//#else
//#pragma comment(lib, "D:/Downloads/mimalloc-2.1.9/out/msvc-x64/Release/mimalloc-static.lib")
//#endif

#ifdef _DEBUG
#pragma comment(lib, "D:/Downloads/mimalloc-dev2/out/msvc-x64/Debug/mimalloc.lib")
#else
#pragma comment(lib, "D:/Downloads/mimalloc-dev2/out/msvc-x64/Release/mimalloc.lib")
#endif


//no memory increase with with dev3
//#ifdef _DEBUG
//#pragma comment(lib, "D:/Downloads/mimalloc-dev3/out/msvc-x64/Debug/mimalloc.lib")
//#else
//#pragma comment(lib, "D:/Downloads/mimalloc-dev3/out/msvc-x64/Release/mimalloc.lib")
//#endif

#ifdef _DEBUG
#pragma comment(lib, "Advapi32.lib")
#endif


static std::atomic_uint64_t g_no_optimize_sum{ 0 };
static std::atomic_uint64_t g_count_allocations{ 0 };
static std::atomic_uint64_t g_count_deallocations{ 0 };
static std::atomic_bool g_first_run{ true };  //allocate more on first run

static int thread_safe_random_number(const int& p_min, const int& p_max)
{
  thread_local std::random_device rd;
  thread_local std::mt19937 generator(rd());
  std::uniform_int_distribution<int> distribution(p_min, p_max);
  const int n = distribution(generator);
  return n;
}

static void allocateMemoryThread(std::queue<void*>& allocatedMemory, std::mutex& memoryMutex, std::condition_variable& memoryCondition)
{
  const size_t totalAllocations = g_first_run ? 1000 : thread_safe_random_number(1, 1000);

  size_t allocated = 0;


  for (size_t i = 0; i < totalAllocations; ++i)
  {
    const size_t allocSize = g_first_run ? 100000 : thread_safe_random_number(1000, 100000);

    void* memory = mi_malloc(allocSize);
    if (!memory)
    {
      printf("out of memory...\n");
      std::terminate();
    }

    ++g_count_allocations;

    *((size_t*)memory) = (totalAllocations * allocSize);

    {//scope
      const std::lock_guard<std::mutex> lock(memoryMutex);
      allocatedMemory.push(memory);
    }

    if (allocated >=1024*1024*100) //max 100 mb
    {
      printf("max per thread reached (no error)..\n");
      break;
    }


    memoryCondition.notify_one();
  }

  {//scope
    const std::lock_guard<std::mutex> lock(memoryMutex);
    allocatedMemory.push(nullptr);
  }
  memoryCondition.notify_one();

}

static void deallocateMemoryThread(std::queue<void*>& allocatedMemory, std::mutex& memoryMutex, std::condition_variable& memoryCondition)
{
  while (true)
  {
    std::unique_lock<std::mutex> lock(memoryMutex);

    memoryCondition.wait(lock, [&allocatedMemory]
      {
        return !allocatedMemory.empty();
      });

    void* memory = allocatedMemory.front();
    allocatedMemory.pop();

    if (memory == nullptr)
    {
      assert(allocatedMemory.empty());
      break;
    }

    g_no_optimize_sum += *((size_t*)memory);

    mi_free(memory);
    ++g_count_deallocations;

  }
}

static void testThread()
{
  std::queue<void*> allocatedMemory;

  std::mutex memoryMutex;
  std::condition_variable memoryCondition;

  std::thread allocatorThread(allocateMemoryThread, std::ref(allocatedMemory), std::ref(memoryMutex), std::ref(memoryCondition));
  std::thread deallocatorThread(deallocateMemoryThread, std::ref(allocatedMemory), std::ref(memoryMutex), std::ref(memoryCondition));

  allocatorThread.join();
  deallocatorThread.join();
}

int main()
{
  const int NUM_THREADS = 100;

  std::vector<std::thread> threads;

  const int max_runs = 10000000;
  int runs = 0;

  while (true)
  {
    ++runs;
    for (int i = 0; i < NUM_THREADS; ++i)
    {
      threads.emplace_back(testThread);
    }

    for (auto& t : threads)
    {
      assert(t.joinable());
      t.join();
    }

    threads.clear();
    if (runs == max_runs)
    {
      break;
    }

    g_first_run = false;
  }

  assert(g_count_allocations == g_count_deallocations);

  return (int)g_no_optimize_sum.load();
}
@JochenBaier JochenBaier changed the title synthetic test: mimalloc 2 uses all available memory, mimalloc 3 ok Memory blow up with synthetic test: mimalloc 2 uses all available memory, mimalloc 3 ok Feb 3, 2025
@JochenBaier JochenBaier changed the title Memory blow up with synthetic test: mimalloc 2 uses all available memory, mimalloc 3 ok Memory blowup with synthetic test: mimalloc 2 uses all available memory, mimalloc 3 ok Feb 3, 2025
@JochenBaier
Copy link
Author

With mimalloc dev3 the chart looks like this:

Image

@daanx
Copy link
Collaborator

daanx commented Feb 10, 2025

@JochenBaier : very interesting -- What is the red/blue line? The yellow line is the "commit" right? (not virtual). We have been working on dev3 in particular to better share memory between threads and reduce the overall commit (and particularly on Windows thread pool), so I am happy to see it works for your synthetic test as well. I would still like to mitigate the bad behavior for v1 and v2 as well though so I will look further into it -- this test will be quite helpful to investigate further. (for dev3 we test on some huge service where test runs can take days which is not ideal)

@JochenBaier
Copy link
Author

JochenBaier commented Feb 10, 2025

Thank you for the response.

What is the red/blue line?

The chart was created with Windows Performance Monitor (saved to csv) with counters: Working Set (blue), Working Set – Private (red) and Private Bytes (yellow).

The scenario for the memory increase in the real application looked like this (I tested what happens if a customer turn of machines during weekend to save energy, or bad network)

  • around 40 other threads IO bound, timer bound
  • 2 independent threads each doing:
    1. create 2 threads
    2. 1 thread try to connect to LAN TCP/IP address (machine) but randomly not reachable, timeout 10 sec
    3. close these 2 threads if not reachable
    4. wait 10 sec
    5. goto 1.

Simulation of the not reachable IP address was done with https://jagt.github.io/clumsy/:

  • reachable for random 1..10 sec
  • not reachable for random 1..10 sec

I saved the charts for the real application test:

Image

Image

Because of this problem we use standard heap manager for now (with around 20 % lower performance in some cases).

@daanx
Copy link
Collaborator

daanx commented Feb 10, 2025

Thanks -- I can repro locally on v2 as well and will look into it more. It seems v3 (dev3) is stable though so that may be the way forward for now (although it is still being tuned).

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants