-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[QUESTION] Splitting big models over multiple GPUs #207
Comments
same question here |
Last time I check this was not very easy to do with pytorch-lightning. We actually used a custom made implementation with FSDP to train these larger models (without using pytorch-lightning). I have to double check if the new versions support FSDP better than the currently used pytorch lightning version (2.2.0.post0). But short answer: model parallelism is not something we are supporting in the current codebase. |
idea here. Ctranslate2 just integrated tensor parallelism. It also support XMLRoberta, so just wondering if we could adapt a bit the converter so that we could run the model within CT2 which is very fast. |
Does it support XLM-R XL? the architecture also differs from XLM-R |
It seems like they improved documentation a lot actually: https://lightning.ai/docs/pytorch/stable/advanced/model_parallel/fsdp.html |
we can adapt if we have the detailed description somewhere. |
When specifying the number of GPUs during inference, is it only for parallelism or is the model loaded piece-wise over multiple GPUs, if it's bigger than individual GPUs? For example I'd like to use XCOMET-XXL and our cluster has many 12GB GPUs.
At first I thought that the model parts will be loaded onto all GPUs, e.g.:
However I'm getting GPU OOM on the first GPU:
Thank you!
unbabel-comet 2.2.1
pytorch-lightning 2.2.0.post0
torch 2.2.1
The text was updated successfully, but these errors were encountered: