Very slow 4x4 convolutions on gfx803 #134

huanzhang12 · 2020-03-30T03:52:40Z

Since the ASM kernels were disabled on gfx803 in commit ce51a4c, 4x4 convolutions on gfx803 default to the very slow gemm algorithm:

./bin/MIOpenDriver conv -n 256 -c 16 -H 28 -W 28 -k 16 -y 4 -x 4 -p 1 -q 1 -u 2 -v 2 -l 1 -j 1 -m conv -g 1 -F 1 -t 1
MIOpenDriver: conv -n 256 -c 16 -H 28 -W 28 -k 16 -y 4 -x 4 -p 1 -q 1 -u 2 -v 2 -l 1 -j 1 -m conv -g 1 -F 1 -t 1
MIOpen Forward Conv. Algorithm: 0, Solution: 33/gemm
GPU Kernel Time Forward Conv. Elapsed: 28.185415 ms (average)
stats: name, n, c, ho, wo, x, y, k, flopCnt, bytesRead, bytesWritten, GFLOPs, GB/s, timeMs
stats: fwd-conv4x4u2, 256, 16, 14, 14, 4, 4, 16,  411041792, 12861440, 3211264, 15, 1, 28.185415
Forward Convolution Verifies on CPU and GPU (7.39356e-08)

Before ASM kernels were disabled, it was much faster:

MIOpenDriver: conv -n 256 -c 16 -H 28 -W 28 -k 16 -y 4 -x 4 -p 1 -q 1 -u 2 -v 2 -l 1 -j 1 -m conv -g 1 -F 1 -t 1
MIOpen Forward Conv. Algorithm: 3, Solution: 15/ConvBinWinogradRxS
GPU Kernel Time Forward Conv. Elapsed: 0.592643 ms (average)
stats: name, n, c, ho, wo, x, y, k, flopCnt, bytesRead, bytesWritten, GFLOPs, GB/s, timeMs
stats: fwd-conv4x4u2, 256, 16, 14, 14, 4, 4, 16,  411041792, 12861440, 3211264, 694, 27, 0.592643
Forward Convolution Verifies on CPU and GPU (4.98272e-08)

The performance reduces from 694 GFLOPs to 15 GFLOPs.

I am wondering why all ASM kernels were disabled for gfx803 instead of disabling individual problematic ones?

Also, even without an ASM implementation, can we use a general OpenCL implementation in this case rather than rely on the extremely slow GEMM? (It seems conv_ocl_dir2Dfwd.cpp is not enabled for most 4x4 convolutions)

The text was updated successfully, but these errors were encountered:

atamazov · 2020-03-31T01:41:17Z

Which version of miopen do you use?

huanzhang12 · 2020-03-31T02:10:39Z

@atamazov Thanks for asking! I am using the current master of this repository (commit 2fe2ae9)
I also tried the pre-compiled MIOpen shipped with Rocm 3.1.
To obtain the performance of ASM kernels, I manually revert commit ce51a4c on the current master.

atamazov · 2020-03-31T22:54:32Z

@huanzhang12 AFAICS you are using 2.2.0. Version 2.3.0 is just released. It includes c58488b that should restore gfx8 performance. Please close this if the issue is resolved.

huanzhang12 · 2020-04-01T02:21:40Z

@atamazov I tried the just released version 2.3.0 and it is amazing! It is great news that ASM kernels are re-enabled on gfx803. The same 4x4 convolution runs at 1684 GFLOPs:

./bin/MIOpenDriver conv -n 256 -c 16 -H 28 -W 28 -k 16 -y 4 -x 4 -p 1 -q 1 -u 2 -v 2 -l 1 -j 1 -m conv -g 1 -F 1 -t 1
MIOpenDriver conv -n 256 -c 16 -H 28 -W 28 -k 16 -y 4 -x 4 -p 1 -q 1 -u 2 -v 2 -l 1 -j 1 -m conv -g 1 -F 1 -t 1
MIOpen Forward Conv. Algorithm: 5, Solution: 26/ConvHipImplicitGemmV4R1Fwd
GPU Kernel Time Forward Conv. Elapsed: 0.244142 ms (average)
stats: name, n, c, ho, wo, x, y, k, flopCnt, bytesRead, bytesWritten, GFLOPs, GB/s, timeMs
stats: fwd-conv4x4u2, 256, 16, 14, 14, 4, 4, 16,  411041792, 12861440, 3211264, 1684, 66, 0.244142
Forward Convolution Verifies on CPU and GPU (7.70828e-08)

My workload involving some 4x4 convolutions runs 10 times faster on v2.3.0. Thank you so much for the hard work and I am closing this issue.

huanzhang12 closed this as completed Apr 1, 2020

huanzhang12 mentioned this issue Apr 2, 2020

Performance comparsion: AMD with ROCm vs NVIDIA with cuDNN? ROCm/tensorflow-upstream#173

Open

xinlipn mentioned this issue Feb 2, 2023

[tests] Fix bug in weights tensor layout in solver test #1950

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Very slow 4x4 convolutions on gfx803 #134

Very slow 4x4 convolutions on gfx803 #134

huanzhang12 commented Mar 30, 2020

atamazov commented Mar 31, 2020

huanzhang12 commented Mar 31, 2020

atamazov commented Mar 31, 2020

huanzhang12 commented Apr 1, 2020

Very slow 4x4 convolutions on gfx803 #134

Very slow 4x4 convolutions on gfx803 #134

Comments

huanzhang12 commented Mar 30, 2020

atamazov commented Mar 31, 2020

huanzhang12 commented Mar 31, 2020

atamazov commented Mar 31, 2020

huanzhang12 commented Apr 1, 2020