Batch processing multiple runs using a GPU with insufficient video memory (any small amount) but entirely within the GPU, i.e. faster than cpu #1669

rPman · 2023-06-01T18:29:09Z

rPman
Jun 1, 2023

Example the following task - running multiple prompts to the same LLM.
LLM weights does not fit into vram, but can be loaded partialy, layer by layer or group of them. Of course, loading weights into vram too slow and we can't move it continuously between cpu-gpu but... We can save the intermediate result of calculations between layers.

Now, we can load part of layers into vram, do partially execution for each tasks from batch, save result, load next part of layers weights, do next computation for tasks awaiting for and repeat. So we can run all layers computations on GPU and minimize data moving from cpu to gpu by increasing number of tasks in batch.

Desktop GPU faster than CPU aprox 6-10 times. Moving weights overheads compared to 1-2 runs for CPU, so more than 10 tasks in batch can make result faster with any gpu vram amount (minimum one layer of maximum size)! Only pci-e bus bandwidth is important.

How much RAM is required to store the context of a single request in a batch? Is there data that can be shared or compressed without sacrificing speed? Can be ~100 contexts stored in 64GB ram?

As far as I understand, this is the approach implemented in this project:
https://github.com/FMInference/FlexGen/blob/main/docs/block_schedule.jpg

p.s. Much more complex to speed up runs using multiple gpu, as i understand, single matrix can be divided to parts and compute on multiple gpu simultaneously

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch processing multiple runs using a GPU with insufficient video memory (any small amount) but entirely within the GPU, i.e. faster than cpu #1669

{{title}}

Replies: 0 comments

Select a reply

Batch processing multiple runs using a GPU with insufficient video memory (any small amount) but entirely within the GPU, i.e. faster than cpu #1669

rPman Jun 1, 2023

Replies: 0 comments

rPman
Jun 1, 2023