Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

单卡训练问题 #99

Open
hongtaofly opened this issue Dec 4, 2021 · 1 comment
Open

单卡训练问题 #99

hongtaofly opened this issue Dec 4, 2021 · 1 comment

Comments

@hongtaofly
Copy link

首先在Ocean下问siamban训练的问题有点冒昧,但确实经验不足,忘见谅!
由于整个实验室共用两个卡,而其他同门的任务默认卡0,导致出现卡0显存已满,卡1还有很多显存的情况。而算法在第11个epoch开始解冻backbone时,显存溢出,所以请问siamban单卡训练时的训练命令是怎样的?

原多卡训练命令:
CUDA_VISIBLE_DEVICES=0,1
python -m torch.distributed.launch
--nproc_per_node=2
--master_port=2333
../../tools/train.py --cfg config.yaml

@JudasDie
Copy link
Contributor

首先在Ocean下问siamban训练的问题有点冒昧,但确实经验不足,忘见谅! 由于整个实验室共用两个卡,而其他同门的任务默认卡0,导致出现卡0显存已满,卡1还有很多显存的情况。而算法在第11个epoch开始解冻backbone时,显存溢出,所以请问siamban单卡训练时的训练命令是怎样的?

原多卡训练命令: CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=2333 ../../tools/train.py --cfg config.yaml

抱歉没有用过siamban的code

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants