Skip to content

Commit

Permalink
Fix exception handling in get_all_ranks_from_group() function (#4862)
Browse files Browse the repository at this point in the history
In the latest Pytorch nightly, the exception thrown from
`torch.distributed.distributed_c10d.get_global_rank()` is changed from
`RuntimeError` to `ValueError` so we need to update our try-catch in
`deepspeed.comm`

Tested with torch version 2.3.0.dev20231221+cu121

Fixes: #4853
  • Loading branch information
HeyangQin authored Dec 22, 2023
1 parent 75c7720 commit c37fe9c
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion deepspeed/comm/comm.py
Original file line number Diff line number Diff line change
Expand Up @@ -595,7 +595,7 @@ def get_all_ranks_from_group(group=None):
while True:
group_ranks.append(cdb.get_global_rank(group, rank))
rank += 1
except RuntimeError:
except (RuntimeError, ValueError):
pass
return group_ranks

Expand Down

0 comments on commit c37fe9c

Please # to comment.