You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix for launching kernel invalid config error when calling embedding … (pytorch#130994)
…with large index
Fixespytorch#130806
When an output size of 2147483648 (=131072*16384) is expected in the above issue, it throwed out the following error:
RuntimeError: HIP error: invalid configuration argument
What happened was that the second parameter passed to hipLaunchKernel was crazy {2147483648,1,1}.
Found two issues in the Indexing.cu:
1: ptrdiff_t was used but it is signed int, outTotalSize >= 2147483648 can cause overflow when doing [this](https://github.com/pytorch/pytorch/blame/39493aa93419532957e6e5ee97cae842b53b8b59/aten/src/ATen/native/cuda/Indexing.cu#L1367):
2: On ROCm, std::min -> ::min did not work as expected when outTotalSize>=2147483648
As the result, 2147483648 was sent to hipLaunchKernel which the GPU does not support such a huge number since this number specifies the number of threads per block. The original code intended to set 128 threads per block, though this is debatable as the perf would not good for latest powerful GPUs (a TODO item to update for perf maybe?) , but at least it would not cause `invalid configuration argument` error.
[Test]
Run the same code snippet in the [issue](pytorch#130806), and print the output, its dim and numel(), which looks like below now:
```
output=tensor([[ 0.4044, -0.0244, -0.6865, ..., -0.7800, 0.1175, 1.6726],
[-1.0866, -0.1609, 0.3538, ..., 1.9105, 0.7882, 1.1583],
[-2.2079, 0.3736, 0.3610, ..., -0.2658, -0.0459, 1.3077],
...,
[ 0.8753, -0.7482, -0.1978, ..., 0.9016, 1.1501, -0.5178],
[-1.5845, -0.6277, 1.4520, ..., 0.5733, -2.1198, -0.0915],
[-0.6310, -1.0239, -0.1910, ..., 0.4309, 0.1630, 0.3239]],
device='cuda:0'), dim=2, numel=2147483648
```
Added a large tensor unit test too.
```
/pytorch# pytest test/nn/test_embedding.py -k test_large_tensors
================================================================================== test session starts ===================================================================================
platform linux -- Python 3.9.19, pytest-7.3.2, pluggy-1.4.0
rootdir: /dockerx/development/pytorch
configfile: pytest.ini
plugins: flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, cpp-2.3.0, hypothesis-5.35.1
collected 288 items / 287 deselected / 1 selected
Running 1 items in this shard
test/nn/test_embedding.py . [100%]
=========================================================================== 1 passed, 287 deselected in 3.16s ============================================================================
```
Pull Request resolved: pytorch#130994
Approved by: https://github.com/jeffdaily, https://github.com/xw285cornell
0 commit comments