Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Predicting incorrect loss when eval data size is not a multiple of batch size #16716

Closed
ajindal1 opened this issue Apr 12, 2022 · 4 comments
Closed

Comments

@ajindal1
Copy link

Environment info

  • transformers version: 4.18.0.dev0
  • Platform: Linux-5.4.0-96-generic-x86_64-with-debian-buster-sid
  • Python version: 3.7.13
  • Huggingface_hub version: 0.5.1
  • PyTorch version (GPU?): 1.12.0.dev20220411+cu102 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): 0.4.1 (cpu)
  • Jax version: 0.3.5
  • JaxLib version: 0.3.5
  • Using GPU in script?: Yes
  • Using distributed or parallel set-up in script?: Yes

Who can help

@sgugger

Issue:

When the input data size is not a multiple of batch_size, the loss calculated seems wrong to me. As mentioned in this line

losses = self._nested_gather(loss.repeat(batch_size))
The loss is repeated batch_size times which does not makes sense for the last input which is not divisible by the batch_size. This also leads to the failure of HF test case (tests/trainer/test_trainer.py::TrainerIntegrationTest::test_evaluate) when I am running this on my device.

To reproduce

Steps to reproduce the behavior:

  1. Install pytest
  2. RUN: pytest tests/trainer/test_trainer.py::TrainerIntegrationTest::test_evaluate

Error:
FAILED tests/trainer/test_trainer.py::TrainerIntegrationTest::test_evaluate - AssertionError: 0.517515242099762 != 0.41851458 within 7 places (0.09900066256523132 difference)

Expected behavior

The test should pass.

@sgugger
Copy link
Collaborator

sgugger commented Apr 12, 2022

No, the evaluation loss is properly computed thanks to this line actually. Repeating it the number of times then truncating to the length of the dataset here makes the final evaluation loss the proper average of all losses.

As for the test not passing, I think you are running it on 2 GPUs? It's only intended to work on one.

@ajindal1
Copy link
Author

Thank you for the quick reply. Yes, I was running the code on 2 GPUs and it works fine on 1 GPU. May I ask why is it intended to work on 1 GPU?

@sgugger
Copy link
Collaborator

sgugger commented Apr 13, 2022

The batch size is actually wrong in that case. Pushing a fix!

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants