Predicting incorrect loss when eval data size is not a multiple of batch size #16716

ajindal1 · 2022-04-12T00:40:04Z

Environment info

transformers version: 4.18.0.dev0
Platform: Linux-5.4.0-96-generic-x86_64-with-debian-buster-sid
Python version: 3.7.13
Huggingface_hub version: 0.5.1
PyTorch version (GPU?): 1.12.0.dev20220411+cu102 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): 0.4.1 (cpu)
Jax version: 0.3.5
JaxLib version: 0.3.5
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: Yes

Who can help

Issue:

When the input data size is not a multiple of batch_size, the loss calculated seems wrong to me. As mentioned in this line

transformers/src/transformers/trainer.py

Line 2469 in 69233cf

losses = self._nested_gather(loss.repeat(batch_size))

The loss is repeated batch_size times which does not makes sense for the last input which is not divisible by the batch_size. This also leads to the failure of HF test case (tests/trainer/test_trainer.py::TrainerIntegrationTest::test_evaluate) when I am running this on my device.

To reproduce

Steps to reproduce the behavior:

Install pytest
RUN: pytest tests/trainer/test_trainer.py::TrainerIntegrationTest::test_evaluate

Error:
FAILED tests/trainer/test_trainer.py::TrainerIntegrationTest::test_evaluate - AssertionError: 0.517515242099762 != 0.41851458 within 7 places (0.09900066256523132 difference)

Expected behavior

The test should pass.

The text was updated successfully, but these errors were encountered:

sgugger · 2022-04-12T11:33:36Z

No, the evaluation loss is properly computed thanks to this line actually. Repeating it the number of times then truncating to the length of the dataset here makes the final evaluation loss the proper average of all losses.

As for the test not passing, I think you are running it on 2 GPUs? It's only intended to work on one.

ajindal1 · 2022-04-12T23:02:06Z

Thank you for the quick reply. Yes, I was running the code on 2 GPUs and it works fine on 1 GPU. May I ask why is it intended to work on 1 GPU?

sgugger · 2022-04-13T17:14:34Z

The batch size is actually wrong in that case. Pushing a fix!

github-actions · 2022-05-12T15:07:23Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

sgugger mentioned this issue Apr 13, 2022

Fix batch size in evaluation loop #16763

Merged

github-actions bot closed this as completed May 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Predicting incorrect loss when eval data size is not a multiple of batch size #16716

Predicting incorrect loss when eval data size is not a multiple of batch size #16716

ajindal1 commented Apr 12, 2022

sgugger commented Apr 12, 2022

ajindal1 commented Apr 12, 2022

sgugger commented Apr 13, 2022

github-actions bot commented May 12, 2022

Predicting incorrect loss when eval data size is not a multiple of batch size #16716

Predicting incorrect loss when eval data size is not a multiple of batch size #16716

Comments

ajindal1 commented Apr 12, 2022

Environment info

Who can help

Issue:

To reproduce

Expected behavior

sgugger commented Apr 12, 2022

ajindal1 commented Apr 12, 2022

sgugger commented Apr 13, 2022

github-actions bot commented May 12, 2022