You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The loss is repeated batch_size times which does not makes sense for the last input which is not divisible by the batch_size. This also leads to the failure of HF test case (tests/trainer/test_trainer.py::TrainerIntegrationTest::test_evaluate) when I am running this on my device.
No, the evaluation loss is properly computed thanks to this line actually. Repeating it the number of times then truncating to the length of the dataset here makes the final evaluation loss the proper average of all losses.
As for the test not passing, I think you are running it on 2 GPUs? It's only intended to work on one.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Environment info
transformers
version: 4.18.0.dev0Who can help
@sgugger
Issue:
When the input data size is not a multiple of batch_size, the loss calculated seems wrong to me. As mentioned in this line
transformers/src/transformers/trainer.py
Line 2469 in 69233cf
To reproduce
Steps to reproduce the behavior:
Error:
FAILED tests/trainer/test_trainer.py::TrainerIntegrationTest::test_evaluate - AssertionError: 0.517515242099762 != 0.41851458 within 7 places (0.09900066256523132 difference)
Expected behavior
The test should pass.
The text was updated successfully, but these errors were encountered: