Throttling GPU usage #11793

wbruna · 2025-02-10T15:03:37Z

wbruna
Feb 10, 2025

On my aging 3400G, the whole desktop GUI (Linux, either X11+XFCE or Wayland+KDE) tends to freeze completely during llama.cpp/stable-diffusion.cpp more intensive GPU computations (on Vulkan). From low to high impact:

low-ish (<= 256-512, depending on the model) batch processing with no layer offloading runs fine, even hitting >95% GPU usage;
higher batch processing causes some GUI stuttering, and tends to slow down prompt processing speed;
full offloading makes the GUI unusable, freezing for 2-3 seconds at a time, mainly during prompt processing;
stable-diffusion.cpp inference also causes 2-3 second freezes;
stable-diffusion.cpp VAE phase completely freezes the interface during its whole run (20s-120s, depending on image size)

Also, these 'choking' events sometimes trigger driver bugs, causing full system lock-ups.

So, I'm looking for ways to throttle GPU usage during inference. What I tried so far:

operating system utilities: no luck on that front. There seems to be no GPU support in cgroups, and utilities that limit GPU usage seem to always focus on fps or vsync;

creating the queues with VK_QUEUE_GLOBAL_PRIORITY_LOW_EXT: doesn't seem to have any effect, but I'm not sure if I'm implementing it correctly (I have zero graphics programming experience). Something like this:

    /* static vk_device ggml_vk_get_device(size_t idx) */
    if (strcmp("VK_EXT_global_priority", properties.extensionName) == 0)
        global_priority = true;
    (...)
    VkDeviceQueueGlobalPriorityCreateInfoEXT low_prio = {};
    low_prio.sType = VK_STRUCTURE_TYPE_DEVICE_QUEUE_GLOBAL_PRIORITY_CREATE_INFO_EXT;
    low_prio.globalPriority = VK_QUEUE_GLOBAL_PRIORITY_LOW_EXT;
    if (global_priority) {
        device_queue_create_infos[0].pNext = &low_prio;
    }

sprinkling ctx->device->device.waitIdle() + sleep before ggml_vk_build_graph calls: kind of works as a proof-of-concept thing, but of course is no real solution.

Thoughts?

wbruna · 2025-02-14T18:50:19Z

wbruna
Feb 14, 2025
Author

It turns out the main culprit was the 'enforce_isolation' option on kernel 6.12. Turning it off eliminates most of the lagging, although I still get some stuttering on VAE processing and at the very beginning of prompt processing.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Throttling GPU usage #11793

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Throttling GPU usage #11793

wbruna Feb 10, 2025

Replies: 1 comment

wbruna Feb 14, 2025 Author

wbruna
Feb 10, 2025

wbruna
Feb 14, 2025
Author