Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

batch config issue? #17

Closed
samyam opened this issue Feb 5, 2020 · 2 comments · Fixed by #33
Closed

batch config issue? #17

samyam opened this issue Feb 5, 2020 · 2 comments · Fixed by #33
Assignees
Labels
bug Something isn't working enhancement New feature or request

Comments

@samyam
Copy link
Contributor

samyam commented Feb 5, 2020

There are a few things in configure train batch size that does not seem correct to me, and there are few things that we do not currently support.

  1. The following assertion

train_batch_size == train_micro_batch_size_per_gpu * gradient_accumulation_step * world_size

should always hold but currently it does not in some cases.
For example, when train_micro_batch_size_per_gpu and gradient accumulation steps are None in the ds_cofig its initialized to train_batch_size and 1 respectively which leads to

train_batch_size == train_batch_size * 1 * world_size

  1. if train_micro_batch_size_per_gpu > per_device_batch_size, we should throw a config error. Currently, its assigned to be equal to per_device_batch_size.

  2. We do not currently support the user providing only the train_micro_batch_size or train_micro_batch_size and gradient _accumulation_steps.

@samyam samyam added bug Something isn't working enhancement New feature or request invalid labels Feb 5, 2020
@samyam samyam linked a pull request Feb 7, 2020 that will close this issue
@samyam
Copy link
Contributor Author

samyam commented Feb 7, 2020

Fixed with pull request Samyamr/batchconfig #33

@rafael-ariascalles
Copy link

I am having this same issue using 0.9 but not 0.8 (using a was p4 machine)

baodii pushed a commit to baodii/DeepSpeed that referenced this issue Oct 17, 2023
radna0 pushed a commit to radna0/DeepSpeed-XLA that referenced this issue Feb 5, 2025
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants