Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Shared system memory is not used in Tensorflow #1251

Open
tarcey opened this issue Feb 7, 2021 · 2 comments
Open

Shared system memory is not used in Tensorflow #1251

tarcey opened this issue Feb 7, 2021 · 2 comments
Assignees

Comments

@tarcey
Copy link

tarcey commented Feb 7, 2021

Please make sure that this is a feature request. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:feature_template

System information

  • TensorFlow version (you are using): 2.4
  • Are you willing to contribute it (Yes/No): ?

Describe the feature and the current behavior/state.

When using large models and/or large batch sizes which would not fit into the GPUs VRAM, one might expect that the system RAM would be used in addition to the VRAM to avoid OOM, similar to how TF works with nvidia's UVM. However, this is not the case and the program crashes with an OOM error. It seems that ROCm already has support for unified memory, but tensorflow-rocm just doesn't make use of it.

Will this change the current api? How?

No

Who will benefit with this feature?

Anyone, especially in a situation as explained in 'Other Info'.

Any Other info.

ROCm version: 4.0.1
GPU: Vega FE (gfx900)

When using batches with varying dimensions as e.g. in sequential models, a few outlier batches with particularly long sequences can lead to an unexpected OOM crash after hours of training. When using unified memory, such situations can be avoided without having to resort to small batch sizes and accepting underutilization of resources. The performance penalty of using unified memory would only affect these few outlier batches, and the performance benefit of larger batch sizes would outweigh this cost because the majority of batches still fit into the VRAM.

@sunway513
Copy link

@deven-amd can you help look at this issue?

@deven-amd deven-amd self-assigned this Mar 4, 2021
@daiaji
Copy link

daiaji commented Mar 21, 2023

Is there progress? This should be helpful for training models on consumer-grade platforms.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

5 participants