New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

#

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Jump to bottom

单卡训练问题 #99

Open

hongtaofly opened this issue Dec 4, 2021 · 1 comment

hongtaofly commented Dec 4, 2021

首先在Ocean下问siamban训练的问题有点冒昧，但确实经验不足，忘见谅！
由于整个实验室共用两个卡，而其他同门的任务默认卡0，导致出现卡0显存已满，卡1还有很多显存的情况。而算法在第11个epoch开始解冻backbone时，显存溢出，所以请问siamban单卡训练时的训练命令是怎样的？

原多卡训练命令：
CUDA_VISIBLE_DEVICES=0,1
python -m torch.distributed.launch
--nproc_per_node=2
--master_port=2333
../../tools/train.py --cfg config.yaml

Contributor

JudasDie commented Dec 20, 2021

首先在Ocean下问siamban训练的问题有点冒昧，但确实经验不足，忘见谅！由于整个实验室共用两个卡，而其他同门的任务默认卡0，导致出现卡0显存已满，卡1还有很多显存的情况。而算法在第11个epoch开始解冻backbone时，显存溢出，所以请问siamban单卡训练时的训练命令是怎样的？

原多卡训练命令： CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=2333 ../../tools/train.py --cfg config.yaml

抱歉没有用过siamban的code

# for free to join this conversation on GitHub. Already have an account? # to comment