-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Error during training #17
Comments
Thank you for pointing this out. We have not trained using float16, and this might be causing some issues. I’m looking into this further, but in the meantime, a quick fix would be to comment out the visualization step causing the error at this line. This change will not affect the training process. Please let me know if you encounter any further issues! |
Hello @veichta , I made the changes you mentioned. What solved this problem was removing this line. After it, I was able to train using either float32 or float16. I've got this warning:
Could you tell me what does it mean? Also, regarding the batch size, you mentioned in a paper that a batch size of 24 was used and that 2 GPU cards were used for training. Is 24 the batch size per GPU or the total batch size? Thanks a lot for your help. |
[UPDATE] |
Thank you for the update! The warning should not affect the training as it's just a deprecation warning which might break in future pytorch releases. As for the batch size, it is the total batch size so for 2 gpus, each one will handle 12 sampels. We did not do any experiments with float16 on training or inference. But any findings when you experiment with it would be very interesting! |
Hello,
I have some problems with the training. I have followed all the steps for the dataset creation. The only difference is that when I created the dataset, many images were missing.
I used the following command.
and this is the error message
The text was updated successfully, but these errors were encountered: