[QUESTION] Splitting big models over multiple GPUs #207

zouharvi · 2024-03-05T12:46:32Z

When specifying the number of GPUs during inference, is it only for parallelism or is the model loaded piece-wise over multiple GPUs, if it's bigger than individual GPUs? For example I'd like to use XCOMET-XXL and our cluster has many 12GB GPUs.

At first I thought that the model parts will be loaded onto all GPUs, e.g.:

comet-score -s data/xcomet_ennl.src -t data/xcomet_ennl_T1.tgt --gpus 5 --model "Unbabel/XCOMET-XL"

However I'm getting GPU OOM on the first GPU:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.00 MiB. GPU 0 has a total capacity of 10.75 GiB of which 11.62 MiB is free. ...

Is it correct that in the above setting the model is being loaded in full 5 times on all 5 GPUs?
Is there a way to split the model over multiple GPUs?

Thank you!

unbabel-comet 2.2.1
pytorch-lightning 2.2.0.post0
torch 2.2.1

The text was updated successfully, but these errors were encountered:

zwhe99 · 2024-03-14T07:57:39Z

same question here

ricardorei · 2024-03-14T18:31:42Z

Last time I check this was not very easy to do with pytorch-lightning.

We actually used a custom made implementation with FSDP to train these larger models (without using pytorch-lightning). I have to double check if the new versions support FSDP better than the currently used pytorch lightning version (2.2.0.post0).

But short answer: model parallelism is not something we are supporting in the current codebase.

vince62s · 2024-03-14T18:35:22Z

idea here. Ctranslate2 just integrated tensor parallelism. It also support XMLRoberta, so just wondering if we could adapt a bit the converter so that we could run the model within CT2 which is very fast.
How different is it from XML Roberta at inference ?

ricardorei · 2024-03-14T18:36:51Z

Does it support XLM-R XL? the architecture also differs from XLM-R

ricardorei · 2024-03-14T18:41:01Z

It seems like they improved documentation a lot actually: https://lightning.ai/docs/pytorch/stable/advanced/model_parallel/fsdp.html

vince62s · 2024-03-14T18:48:01Z

Does it support XLM-R XL? the architecture also differs from XLM-R

we can adapt if we have the detailed description somewhere.
cc @minhthuc2502

zouharvi added the question Further information is requested label Mar 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] Splitting big models over multiple GPUs #207

[QUESTION] Splitting big models over multiple GPUs #207

zouharvi commented Mar 5, 2024

zwhe99 commented Mar 14, 2024

ricardorei commented Mar 14, 2024

vince62s commented Mar 14, 2024

ricardorei commented Mar 14, 2024

ricardorei commented Mar 14, 2024

vince62s commented Mar 14, 2024

[QUESTION] Splitting big models over multiple GPUs #207

[QUESTION] Splitting big models over multiple GPUs #207

Comments

zouharvi commented Mar 5, 2024

zwhe99 commented Mar 14, 2024

ricardorei commented Mar 14, 2024

vince62s commented Mar 14, 2024

ricardorei commented Mar 14, 2024

ricardorei commented Mar 14, 2024

vince62s commented Mar 14, 2024