Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
To solve #4726 , I change the dtype of loss tensor into float32 in the last stage of pipeline. **test result** before dist.broadcast ``` [2023-11-24 14:06:04,709] [INFO] [engine.py:590:_aggregate_total_loss] [Rank 2] before dist.broadcast(is_last_stage) (tensor([2.3203, 2.3203], device='cuda:2'), torch.float32, device(type='cuda', index=2)), src_rank=2 (1, 2) [2023-11-24 14:06:04,710] [INFO] [engine.py:590:_aggregate_total_loss] [Rank 3] before dist.broadcast(is_last_stage) (tensor([2.3203, 2.3203], device='cuda:3'), torch.float32, device(type='cuda', index=3)), src_rank=3 (1, 2) ``` After dist.broadcast, you can see the broadcast result is correct between rank 2 and rank 0 as well as rank 3 and rank 1. ``` [2023-11-24 14:06:05,016] [INFO] [engine.py:608:_aggregate_total_loss] [Rank 1] after dist.broadcast(other stage) (tensor([2.3203, 2.3203], device='cuda:1'), torch.float32) [2023-11-24 14:06:05,043] [INFO] [engine.py:608:_aggregate_total_loss] [Rank 0] after dist.broadcast(other stage) (tensor([2.3203, 2.3203], device='cuda:0'), torch.float32) ``` For more information. please refer #4726. Co-authored-by: ryan <ruanzhixiang1@huawei.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
- Loading branch information