Multi-GPU training #311

ghost · 2020-07-16T07:27:06Z

I have trained SBERT model from scratch using the code https://github.com/UKPLab/sentence-transformers/blob/master/examples/training_transformers/training_nli.py and https://github.com/UKPLab/sentence-transformers/blob/master/examples/training_transformers/training_stsbenchmark_continue_training.py on a single GPU.

Now, I would like to train the model from scratch using two GPUs. I'm not sure regarding the changes I have to makes in the above code so that I can train the model using two GPUs.

@nreimers

nreimers · 2020-07-16T14:38:45Z

Hi @kalyanks0611

I did some preliminary experiments with wrapping the model in DataParallel and training on two GPUs.

However, the speed was worse compared to training on a single GPU. So I didn't follow up on this.

If someone gets this working (+ speedup compared to training on one GPU), I would be happy if the code could be shared here.

ghost · 2020-07-17T02:01:53Z

In general, when a model is trained using multiple GPU, training should be much faster. Any thoughts on, " why the speed was worse compared to training on a single GPU?"
@nreimers

nreimers · 2020-07-17T07:36:00Z

Hi @kalyanks0611
A challenge when training on multi-GPU is the communication overhead between the two GPUs. Often, sending data from one to the other GPU is quite slow. After each gradient step, the gradients are synced between the GPUs. This drastically decreases the performance.

At least in 2017, Pytorch DataParallel was not really efficient:
facebookresearch/fairseq#34

I don't know if this has improved since then. As mentioned, on the servers I tested, I saw a significant speed drop. Maybe this has changed with more recent versions of Pytorch / Transformers.

zhangdan8962 · 2020-07-17T08:10:19Z

What about using DistributedDataParallel?

nreimers · 2020-07-17T08:11:19Z

DistributedDataParallel is for having multiple servers. Haven't tested that, but there the communication overhead is even larger.

zhangdan8962 · 2020-07-17T08:29:34Z

In fact, DDP also can be used on one machine. And as stated in the following tutorial, DDP is faster than DataParallel even on a single node.
https://pytorch.org/tutorials/intermediate/ddp_tutorial.html

nreimers · 2020-07-17T08:33:12Z

Hi @zhangdan8962
That is interesting. I will have a look

ghost · 2020-07-17T12:21:09Z

To overcome the issue in DataParallel, there is a PyTorch package called PyTorch-Encoding.

from parallel import DataParallelModel, DataParallelCriterion

parallel_model = DataParallelModel(model)             # Encapsulate the model
parallel_loss  = DataParallelCriterion(loss_function) # Encapsulate the loss function

predictions = parallel_model(inputs)      # Parallel forward pass
                                          # "predictions" is a tuple of n_gpu tensors
loss = parallel_loss(predictions, labels) # Compute loss function in parallel
loss.backward()                           # Backward pass
optimizer.step()                          # Optimizer step
predictions = parallel_model(inputs)      # Parallel forward pass with new parameters

(this code taken from https://medium.com/huggingface/training-larger-batches-practical-tips-on-1-gpu-multi-gpu-distributed-setups-ec88c3e51255 )

@nreimers

liuyukid · 2020-08-14T09:57:35Z

A simple implementation：https://github.com/liuyukid/sentence-transformers/blob/master/sentence_transformers/SentenceTransformer.py
I don't know if the speed can be improved, but at least support larger batch_size
You can try it!

genaunit · 2021-09-24T01:47:04Z

Hi, anyone had success with parallelizing SentenceTransformer training to multiple GPUs using the PyTorch-Encoding approach that @kalyanks0611 brought up two comments above?

ajmcgrail · 2022-10-21T04:07:00Z

Hey, +1ing the above comment, any update on multi gpu training?

genaunit · 2022-10-24T19:21:41Z

Hey @challos , I was able to make it work using a pretty ancient version of sentence transformers (0.38 because I had to). I think that if you can use the up to date version, they have some native multi-GPU support. If not, I found this article from one of Huggingface guys instrumental. He refers to a piece of code from zhanghang1989 (on github), which I was able to use almost verbatim (I think there was a small bug there for my use case but it is mostly useable a is - if you see a crash you'll know how to fix it):

https://medium.com/huggingface/training-larger-batches-practical-tips-on-1-gpu-multi-gpu-distributed-setups-ec88c3e51255

Get through the explanation in that article - it is somewhat dense but useful in the end. And the code does just that.

prvnktech · 2022-12-23T13:37:47Z

Do we have any update on Multi GPU Training?

shoegazerstella · 2023-06-26T08:33:59Z

Any update on this? thanks

liqi6811 · 2023-07-19T06:27:34Z

A simple implementation：https://github.com/liuyukid/sentence-transformers/blob/master/sentence_transformers/SentenceTransformer.py I don't know if the speed can be improved, but at least support larger batch_size You can try it!

I tried this code, to train on 1 worker 4 GPUs, it is not faster, about the same speed as 1 worker 1 GPU. Anybody has good ideas?

sangyongjia · 2023-07-21T12:31:03Z

can not find a solution.

dkchhetri · 2023-12-15T21:21:41Z

https://medium.com/huggingface/training-larger-batches-practical-tips-on-1-gpu-multi-gpu-distributed-setups-ec88c3e51255

Got same result here with 4GPU, no acceleration (only the batch size increased by 4x)

zhanxlin · 2024-05-10T12:31:17Z

Hi @kalyanks0611

I did some preliminary experiments with wrapping the model in DataParallel and training on two GPUs.

However, the speed was worse compared to training on a single GPU. So I didn't follow up on this.

If someone gets this working (+ speedup compared to training on one GPU), I would be happy if the code could be shared here.

Hi, Will you implement multi-GPU code? Because with the improvement of computing resources, everyone is no longer satisfied with using 2 GPUs, but uses more GPUs.

tomaarsen · 2024-05-10T14:12:32Z

Hello @zhanxlin,

Multi-GPU support is being introduced in the upcoming v3.0 release of Sentence Transformers (planned in a few weeks). See v3.0-pre-release for the code, in case you already want to play around with it. I think the following should work:

pip install git+https://github.com/UKPLab/sentence-transformers@v3.0-pre-release

There's some details in #2449 about how the training will be changed, and how to use MultiGPU training. But to give you a sneak peek on the latter:

Data Parallelism is automatically applied if you use multiple GPUs
Distributed Data Parallelism is automatically applied if you run the training script with torchrun or accelerate instead of python.

As you can imagine, this results in very notable training speedups.

Tom Aarsen

bely66 · 2024-05-27T16:46:45Z

Hi @tomaarsen
Any idea when the release exact date is?

tomaarsen · 2024-05-27T16:58:44Z

Hello @bely66,

I'm preparing for the release to be this week. I can't promise an exact date as there might be some unexpected issues.

Tom Aarsen

tomaarsen mentioned this issue Feb 1, 2023

Where can I set the code to train the setfit model on multiple GPUs? huggingface/setfit#293

Closed

This was referenced Oct 23, 2023

Add Support for Multi-GPU Training with PyTorch Lightning #2337

Closed

Add Support for Multi-GPU Training with PyTorch Lightning #2338

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-GPU training #311

Multi-GPU training #311

ghost commented Jul 16, 2020

nreimers commented Jul 16, 2020

ghost commented Jul 17, 2020

nreimers commented Jul 17, 2020

zhangdan8962 commented Jul 17, 2020

nreimers commented Jul 17, 2020

zhangdan8962 commented Jul 17, 2020

nreimers commented Jul 17, 2020

ghost commented Jul 17, 2020

liuyukid commented Aug 14, 2020

genaunit commented Sep 24, 2021

ajmcgrail commented Oct 21, 2022

genaunit commented Oct 24, 2022

prvnktech commented Dec 23, 2022

shoegazerstella commented Jun 26, 2023

liqi6811 commented Jul 19, 2023

sangyongjia commented Jul 21, 2023

dkchhetri commented Dec 15, 2023

zhanxlin commented May 10, 2024

tomaarsen commented May 10, 2024 •

edited

Loading

bely66 commented May 27, 2024

tomaarsen commented May 27, 2024

Multi-GPU training #311

Multi-GPU training #311

Comments

ghost commented Jul 16, 2020

nreimers commented Jul 16, 2020

ghost commented Jul 17, 2020

nreimers commented Jul 17, 2020

zhangdan8962 commented Jul 17, 2020

nreimers commented Jul 17, 2020

zhangdan8962 commented Jul 17, 2020

nreimers commented Jul 17, 2020

ghost commented Jul 17, 2020

liuyukid commented Aug 14, 2020

genaunit commented Sep 24, 2021

ajmcgrail commented Oct 21, 2022

genaunit commented Oct 24, 2022

prvnktech commented Dec 23, 2022

shoegazerstella commented Jun 26, 2023

liqi6811 commented Jul 19, 2023

sangyongjia commented Jul 21, 2023

dkchhetri commented Dec 15, 2023

zhanxlin commented May 10, 2024

tomaarsen commented May 10, 2024 • edited Loading

bely66 commented May 27, 2024

tomaarsen commented May 27, 2024

tomaarsen commented May 10, 2024 •

edited

Loading