You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please make sure that this is a feature request. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:feature_template
System information
TensorFlow version (you are using): 2.4
Are you willing to contribute it (Yes/No): ?
Describe the feature and the current behavior/state.
When using large models and/or large batch sizes which would not fit into the GPUs VRAM, one might expect that the system RAM would be used in addition to the VRAM to avoid OOM, similar to how TF works with nvidia's UVM. However, this is not the case and the program crashes with an OOM error. It seems that ROCm already has support for unified memory, but tensorflow-rocm just doesn't make use of it.
Will this change the current api? How?
No
Who will benefit with this feature?
Anyone, especially in a situation as explained in 'Other Info'.
Any Other info.
ROCm version: 4.0.1
GPU: Vega FE (gfx900)
When using batches with varying dimensions as e.g. in sequential models, a few outlier batches with particularly long sequences can lead to an unexpected OOM crash after hours of training. When using unified memory, such situations can be avoided without having to resort to small batch sizes and accepting underutilization of resources. The performance penalty of using unified memory would only affect these few outlier batches, and the performance benefit of larger batch sizes would outweigh this cost because the majority of batches still fit into the VRAM.
The text was updated successfully, but these errors were encountered:
Please make sure that this is a feature request. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:feature_template
System information
Describe the feature and the current behavior/state.
When using large models and/or large batch sizes which would not fit into the GPUs VRAM, one might expect that the system RAM would be used in addition to the VRAM to avoid OOM, similar to how TF works with nvidia's UVM. However, this is not the case and the program crashes with an OOM error. It seems that ROCm already has support for unified memory, but tensorflow-rocm just doesn't make use of it.
Will this change the current api? How?
No
Who will benefit with this feature?
Anyone, especially in a situation as explained in 'Other Info'.
Any Other info.
ROCm version: 4.0.1
GPU: Vega FE (gfx900)
When using batches with varying dimensions as e.g. in sequential models, a few outlier batches with particularly long sequences can lead to an unexpected OOM crash after hours of training. When using unified memory, such situations can be avoided without having to resort to small batch sizes and accepting underutilization of resources. The performance penalty of using unified memory would only affect these few outlier batches, and the performance benefit of larger batch sizes would outweigh this cost because the majority of batches still fit into the VRAM.
The text was updated successfully, but these errors were encountered: