-
Notifications
You must be signed in to change notification settings - Fork 814
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Exceptions and failures when use MultiWorkerMirroredStrategy #373
Comments
can you try to replace AdamWeightDecay by simple Adam first ? |
Yes. I did it but no luck and faced the same error again. |
Yes. I did it but no luck and faced the same error again. |
Hi, any update? |
Hi, just to confirm, the fix added (given below) will solve my problem? Please confirm if the fix is against this bug. Support Multi-GPU gradient Accumulate for trainer. #377 |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
When I use tf.distribute.experimental.MultiWorkerMirroredStrategy to run training on multiple machine I face following errors. Please advise when other necessary changes are needed.
2020-11-16 12:03:50,968 (cross_device_ops:1130) INFO: Collective batch_all_reduce for IndexedSlices: 1 all-reduces, group_size = 2
2020-11-16 12:03:56.443402: W tensorflow/core/grappler/optimizers/scoped_allocator_optimizer.cc:439] error: Internal: Complete shape not known for AdamWeightDecay/allreduce/CollectiveReduce_23
2020-11-16 12:03:56.443474: W tensorflow/core/grappler/optimizers/scoped_allocator_optimizer.cc:1121] error: Internal: Complete shape not known for AdamWeightDecay/allreduce/CollectiveReduce_23
2020-11-16 12:03:56.443606: E tensorflow/core/grappler/optimizers/scoped_allocator_optimizer.cc:1138] ScopedAllocatorOptimizer: Internal: Complete shape not known for AdamWeightDecay/allreduce/CollectiveReduce_23
The text was updated successfully, but these errors were encountered: