You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For the context, PytorchTrial does not support FSDP and there're no plans to add that. For FSDP, you should use Core API instead, and it'll be effectively the same as the torch DDP: standard torch distributed launcher works the same, metrics logging and hpsearch work the same. if you checkpoint full model from rank=0, it'll work the same as well. if you want to do sharded checkpointing, use the sharded checkpointingshard=True option.
Hi,
Does Determined support the PyTorch FSDP way of distributed training? I can see examples for DeepSpeed, but I have a requirement to specifically use native FSDP feature of PyTorch 2.2 (something like https://pytorch.org/tutorials/intermediate/FSDP_adavnced_tutorial.html?highlight=pre%20training).
The text was updated successfully, but these errors were encountered: