-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Multi-GPU training #311
Comments
Hi @kalyanks0611 I did some preliminary experiments with wrapping the model in DataParallel and training on two GPUs. However, the speed was worse compared to training on a single GPU. So I didn't follow up on this. If someone gets this working (+ speedup compared to training on one GPU), I would be happy if the code could be shared here. |
In general, when a model is trained using multiple GPU, training should be much faster. Any thoughts on, " why the speed was worse compared to training on a single GPU?" |
Hi @kalyanks0611 At least in 2017, Pytorch DataParallel was not really efficient: I don't know if this has improved since then. As mentioned, on the servers I tested, I saw a significant speed drop. Maybe this has changed with more recent versions of Pytorch / Transformers. |
What about using DistributedDataParallel? |
DistributedDataParallel is for having multiple servers. Haven't tested that, but there the communication overhead is even larger. |
In fact, DDP also can be used on one machine. And as stated in the following tutorial, DDP is faster than DataParallel even on a single node. |
Hi @zhangdan8962 |
To overcome the issue in DataParallel, there is a PyTorch package called PyTorch-Encoding.
(this code taken from https://medium.com/huggingface/training-larger-batches-practical-tips-on-1-gpu-multi-gpu-distributed-setups-ec88c3e51255 ) |
A simple implementation:https://github.com/liuyukid/sentence-transformers/blob/master/sentence_transformers/SentenceTransformer.py |
Hi, anyone had success with parallelizing SentenceTransformer training to multiple GPUs using the PyTorch-Encoding approach that @kalyanks0611 brought up two comments above? |
Hey, +1ing the above comment, any update on multi gpu training? |
Hey @challos , I was able to make it work using a pretty ancient version of sentence transformers (0.38 because I had to). I think that if you can use the up to date version, they have some native multi-GPU support. If not, I found this article from one of Huggingface guys instrumental. He refers to a piece of code from Get through the explanation in that article - it is somewhat dense but useful in the end. And the code does just that. |
Do we have any update on Multi GPU Training? |
Any update on this? thanks |
I tried this code, to train on 1 worker 4 GPUs, it is not faster, about the same speed as 1 worker 1 GPU. Anybody has good ideas? |
can not find a solution. |
Got same result here with 4GPU, no acceleration (only the batch size increased by 4x) |
Hi, Will you implement multi-GPU code? Because with the improvement of computing resources, everyone is no longer satisfied with using 2 GPUs, but uses more GPUs. |
Hello @zhanxlin, Multi-GPU support is being introduced in the upcoming v3.0 release of Sentence Transformers (planned in a few weeks). See v3.0-pre-release for the code, in case you already want to play around with it. I think the following should work:
There's some details in #2449 about how the training will be changed, and how to use MultiGPU training. But to give you a sneak peek on the latter:
As you can imagine, this results in very notable training speedups.
|
Hi @tomaarsen |
Hello @bely66, I'm preparing for the release to be this week. I can't promise an exact date as there might be some unexpected issues.
|
I have trained SBERT model from scratch using the code https://github.com/UKPLab/sentence-transformers/blob/master/examples/training_transformers/training_nli.py and https://github.com/UKPLab/sentence-transformers/blob/master/examples/training_transformers/training_stsbenchmark_continue_training.py on a single GPU.
Now, I would like to train the model from scratch using two GPUs. I'm not sure regarding the changes I have to makes in the above code so that I can train the model using two GPUs.
@nreimers
The text was updated successfully, but these errors were encountered: