Problem with finetune model #318

puppyapple · 2019-11-28T00:49:06Z

@erogol @reuben @twerkmeister @ekr Hey guys, thanks for you great work here! I'm trying training Tacotron2 with a custom dataset and generally it runs well but still with some issues that I failed to resolve. It would be kind if you could give me some ideas about them.
The nearest problem I got yesterday is when I tried to finetune my model with BN version prenet as mentioned by @erogol in other comment. But with distributed training launched by 'python3.7 distribute.py --restore_path xxxx/best_model.pth.tar', I soon got cuda memory error while found the GPU situation showed below. If I understand well, the main GPU device 0 has been used by all the other 7 subprocess and ran out of memeory while the other 7 still had free memories. I did some search and this probably relates to the restore of Adam optimizer since someone comment that Adam has to restore all parameters only from main GPU device? Any idear about this?

Other doubts are about some training details that I posted here #58 (comment)
I woud be grateful if you could share some ideas with me, thanks in advance!

puppyapple · 2019-12-05T07:01:05Z

Anyone got same issue like this?

erogol · 2019-12-07T00:11:53Z

There is a small bug in master fine-tune which wastes some memory with the loaded checkpoint. I guess Ive fixed it on Dec branch. Otherwise in couple days I will

puppyapple · 2019-12-07T00:26:28Z

Thanks for the reply. I encounter this issue when using dev branch. Looking forward to your update! I tried to locate the problem myself but no luck with that 😂

erogol · 2019-12-09T12:35:44Z

now fixed on dev

puppyapple mentioned this issue Dec 2, 2019

English synthsis is good, how about Chinese? #58

Closed

erogol closed this as completed Dec 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with finetune model #318

Problem with finetune model #318

puppyapple commented Nov 28, 2019

puppyapple commented Dec 5, 2019

erogol commented Dec 7, 2019

puppyapple commented Dec 7, 2019

erogol commented Dec 9, 2019

Problem with finetune model #318

Problem with finetune model #318

Comments

puppyapple commented Nov 28, 2019

puppyapple commented Dec 5, 2019

erogol commented Dec 7, 2019

puppyapple commented Dec 7, 2019

erogol commented Dec 9, 2019