-
-
Notifications
You must be signed in to change notification settings - Fork 620
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
DeepSpeed support for ignite.distributed #2008
Comments
@Kashu7100 Thank you for this suggestion! I confirm that it would be very nice to support DeepSpeed with Currently we have docker environment configured with MS DeepSpeed. https://github.com/pytorch/ignite/tree/master/docker/msdp Would you like to contribute on this ? It seems you already know how to do it 😉 |
@sdesrozis Do you think it is possible to reuse
|
It depends on what you want to do. The features list of msdp is quite long and there are more or less deep impacts. For instance, I think that the pipeline parallelism would be a very nice feature to have but not trivial to adapt. Maybe a first step could be the distributed parallelism using the simplified api as you mentioned. Thus, it may be a new backend to develop and integrate in our You can have a look here. Btw, it's not an easy task and maybe I'm wrong about what to do. @vfdev-5 was looking further on this, maybe he could help in the discussion. |
@Kashu7100 Finally, introducing a new backend does not seem to be the good option. Have a look here, and you would see that native PyTorch distributed is used when distributed environment variables are set. That is a good news for simple use cases.
I would say yes. |
@Kashu7100 thanks for the feature request! Yes, we plan to improve our support of deepspeed framework which is roughly:
Our idea was to provide basic integration examples of how to use ignite and deepspeed together. I looked at it multiple times and due to certain overlap between the framework it was not obvious where to put the split. @sdesrozis I'm not sure whether we should add it as a new backend or not. Let's first create basic integration example and see which part of DeepSpeed code could be simplified using |
I think this could be integrated in our native backend, beside slurm.
IMO it is not necessary.
That is a good option. As discussed a few weeks ago, the specific engine should be the tricky part. Otherwise, auto helpers could do the job. I suppose. |
Hi, is there any update on this? |
@saifullah3396 well this feature is not really a priority right now. If you would like to help with, we can guide your development from ignite side. |
🚀 Feature
Pytorch lightning recently added native support for MS DeepSpeed.
I believe it is also helpful for users if ignite incorporates the DeepSpeed pipeline for memory-efficient distributed training.
1. for idist.auto_model ..?
To initialize the DeepSpeed engine:
And for distributed environment setup, we need to replace
torch.distributed.init_process_group(...)
todeepspeed.init_distributed()
2. checkpoint handler
slightly different thing for checkpointing
The text was updated successfully, but these errors were encountered: