You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
我在训练模型的时候当batch_size=1的时候可以完美训练,在单张3090显卡上。
但是将batch_size设置为2就会报错。错误如下:
return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [15, 2182] at entry 0 and [15, 5269] at entry 1
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
似乎是启用多线程(num_workers!=0)提示哪个线程有问题,因为batch合并时维度不一样,导致第一个线程就挂了(worker process 0)
Good job!
非常好的工作!
我在训练模型的时候当batch_size=1的时候可以完美训练,在单张3090显卡上。
但是将batch_size设置为2就会报错。错误如下:
return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [15, 2182] at entry 0 and [15, 5269] at entry 1
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
似乎是启用多线程(num_workers!=0)提示哪个线程有问题,因为batch合并时维度不一样,导致第一个线程就挂了(worker process 0)
请问你们那边可以跑通更大的batch_size吗? 为什么在您提供的代码库中我遇到了这样的问题呢?
我运行的代码如下:
python -m torch.distributed.launch --nproc_per_node=1 --master_port 1234 main_DCAdapt.py configs/train_adapt_ft3d_kitti.yaml
The text was updated successfully, but these errors were encountered: